


VOL. XXVI JANUARY, 1935 NO. 1 


The Journal of Educational 
Psychology 


Devoted Primarily to the Scientific Study of Problems of Learning and Teaching 





CONTENTS 
Prima Facie Validity in Character Tests... ......2... 1 
GOODWIN WATSON AND GEORGE FORLANO 
BY ae ee eae ee ee eee 17 


JACOB TUCKMAN 


An Experimental Study of the Old and New Types of Examination: 
re ais 6 & 6 6 aw es We 6 ahd 30 


GEORGE MEYER 
The Effect of Improvement in Reading Ability on Intelligence Test 
Scor 


Pe ee a eee 6 in a Be ok ee oT ae wae 41 
J. W. HAWTHORNE 
The Relation of Chronological Age to Achievement in the Study of 
EE hr Sees ee nw 6 ee ee wk ee 52 
F. H. FINCH AND OLIVER R. FLOYD 
Superstition and Personality ...............2.4. 59 
JAMES PAGE 
Some Measurements of the Effects of Reviews. ......... . 65 
H. A. PETERSON, MARY ELLIS, NORINE TOOHILL, AND PEARL KLOESS 
A New Device that Scores Tests... ..........24.. 73 
NOEL B. CUFF 
a ec a eg i ww ke 78 


$6.00 per Year - Published Monthly September to May 


WARWICK & YORK, INC. 


BALTIMORE, MD. 


Entered as Second Class Matter Nov. 15, 1921, at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879; additional entry as Second Class Matter at York, Pa. 














THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Estabitshed 1910 


EDITORS 


Jacx W. Duntap 
Fordham University 


Harotp E. Jones Percivat M. Symonps 
University of California Teachers College, 
Columbia University 


H. E. Bucunouiz, Managing Editor 





T= price of the Journal is $6.00 a year in the United States; $6.40 in 
foreign countries. Part-year subscriptions are 90 cents for each num- 
ber ordered. Back volumes are $7.00 each; back issues are $1.10 each 
except when more than five years old, and then $1.20 each. 


Subscribers should notify the publishers of change in address at 
least four weeks before publication of the issue with which the change is 
to take effect. Claims for non-receipt of an issue will not be honored 
unless made within two weeks after receipt of the next succeeding 
number. 


Unsolicited manuscripts should be accompanied with return post- 
age. Manuscripts, books and other materials for review, and corre- 
spondence regarding editorial and business matters should be addressed 
to the Journal. 


WARWICK AND YORK . Puddishers - BALTIMORE, MD. 


An Experimental Study of Certain Factors Affecting 
Transfer of ‘Training in Arithmetic 


By James Rosert OvERMAN 


_ An investigation of the effect of instruction on three types of examples in two-place 
addition upon the pupils’ ability to handle closely related types in addition and subtraction, 
and to determine whether the amount of transfer is a function of the method of teaching. 
The problem was to ascertain if the amount of such transfer can be increased by helping 
the pupils consciously to generalize the process and to formulate, from the type taught, a 
general method of procedure applicable to related types. 


$3.00 plus 12¢ postage 
WARWICK AND YORK, Publishers, BALTIMORE, MD. 


eee ieetiioeeeeeetiessaeneaaaall ——$—$—$ $$ _ —_ -—__—_ 





























NDS 


O in 
ach 


$s at 
re is 
ored 
Jing 


oOst- 
yrre- 
ssed 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 

















Volume XXVI ‘January, 1935 Number 1 








PRIMA FACIE VALIDITY IN CHARACTER TESTS 


GOODWIN WATSON AND GEORGE FORLANO 
Teachers College, Columbia University 


The greatest problem in character measurement remains the gap 
between the objectives of character education and the instruments of 
measurement. Tests measure a variety of interesting aspects of 
personality, but seldom can be said to have unimpeachable bearing 
upon what enlightened educators wish to achieve. 

As a step toward determining the validity of some available 
measures, six hundred twenty-nine sample items from thirty-one tests 
or scales were submitted to judges. The judges were one hundred 
fifty graduate students, most of them experienced educators, who had 
nearly completed a course in the Psychology of Character, who had 
read the Tenth Yearbook of the Department of Superintendence, dealing 
with Character Education, and also Hartshorne’s book, Character 
in Human Relations. Both books offer critical discussions of character 
objectives. 

The following table indicates the tests used and gives for each a 
sample of the kind of item submitted to the judges. 


TaBLeE I 
Test SampLe Items 
Maller self-marking test (honesty)..... Positive: X in scoring his own paper 
reported the correct total. 

Negative: X in scoring his own paper 
which was actually 40 per cent right, 
reported it was 90 per cent right. 

Coordination (honesty)............... Positive: Given a task to accomplish with 
eyes closed, although it leads to failure, 
X keeps eyes closed tight. 

Negative: Given a task to accomplish 
with eyes closed, X succeeds in doing 
it by peeking a little but in a way that 
might not be noticed by the teacher. 

1 





The Journal of Educational Psychology 


Guess who (reputation) 


eee eee eeeeene 


Check list (teacher rating) 


Opinion ballot (moral knowledge) 


FSA Pati ee ee 


Moral knowledge 


wen tee A EE Ea + 


Rogers (personality adjustment) 


Aaa he 


Ruggles (distraction) 





Forlano (adult characterizations) 


Sweet (personal attitudes) 


SamMpLE ITEMS 


Positive: The majority of the class voted 
that X is someone who is always doing 
little things to make others happy. 

Negative: The majority of the class voted 
that X is a crabber and knocker. 
Nothing is right and he is always kick- 
ing and complaining. 

Positive: The teacher reports that X is 
cheerful. 

Negative: The teacher reports that X isa 
bluffer. 

Positive: X believes that he is that type 
of boy or girl who would help someone 
even if he had to give up someihing he 
wanted for himself. 

Negative: X believes that an honest boy 
would steal if he wasn’t likely to get 
caught. 

Positive: X thinks it is his duty to apolo- 
gize if he has been rude or discourteous. 

Positive: X thinks that when he grows up 
he would like to be one of the leaders 
in whatever town he lives. 

Negative: Of three wishes, X checks the 
statement which says, ‘“‘I would like to 
have a different father and mother” 
as his first wish. 

Positive: X accomplishes just as efficient 
simple arithmetic on a sheet the mar- 
gins of which contain funny drawings, 
jokes, riddles, etc. as on a plain sheet. 

Negative: X is considerably slowed down 
in the accomplishment of simple 
arithmetic if, instead of a plain sheet of 

paper, the margins are filled with 
distracting cartoons, jokes, riddles, etc. 

Positive: X faces reality fairly well. 

Negative: X is rude to his mother and 
told her that he did not care for her. 

Positive: X’s ideals of what children 
ought to like or dislike agree closely 
with those of other pupils in the same 


group. 

Negative: X reports a series of actual 
likes and dislikes which differ to an 
unusual degree from what he believes 
ought to be liked or disliked. 








Validity in Character Tests 3 


TEsT 
Questionnaire for mothers 


Maller (cooperation) 


“ere e eee eee eneeeee 


Kits (service act) 


eee erer eee eer eeeeee eee 


Forlano (selected 
studies). 
Conduct record (teacher’s rating) 


Sampie Irems 


Positive: Mother describes the habitual 
behavior of X as happy. 

Negative: Mother describes the habitual 
behavior of X as stubborn. 

Positive: X works much harder at simple 
addition problems when the achieve- 
ment counts for the class than when 
the achievement counts for himself. 

Negative: X works much harder at sim- 
ple addition problems when the 
achievement counts for himself, than 
when the credit goes to a general class 
score. 

Positive: X is more willing than the 
average child to give away a small kit 
of articles to pupils having none. 

Negative: X is not willing at all to make 
scrapbooks for sick children. 


items from case Positive: X is very even-tempered. 


Negative: X is vain. 

Positive: An adult observer thinks X 
responds to trust imposed. 

Negative: An adult observer thinks X 
is totally blind to the consequences of 
his acts on himself or others. 


Hayes (scale of observed classroom Positive: X shows work to visitor of his 


behavior). 


Good citizenship (moral knowledge). . . 


own accord. 

Negative: X laughs when another pupil 
makes a mistake, for example in pro- 
nouncing a long word. 

Positive: On paper and pencil test, X is 
confronted with the following situation. 
A boy is being teased by some other 
children. X thinks that the most 
sensible and useful thing to do would be 
to prepare something everybody can 
play, so they won’t think about teasing. 

Negative: X is asked to suppose this 
situation. Someone living on _ the 
street where the children are playing 
tells the children that there is a sick 
lady in the house and ask them to be 
quiet. Whereupon X checks the fol- 


lowing answer as the most sensible 
and helpful, namely, ‘“‘To ask that she 
be sent to a hospital.” 





4 | The Journal of Educational Psychology 


TrEst SampLe IremMs 


Hayes questionnaire (self-report on Positive: X has planned what he will do 
health, home, school, recreation, life when he grows up. 


plan). Negative: X is earning money but not 
saving any of it. 
Stunt party (inhibition).............. Positive: While playing games at a party, 


X is always able to wait for signals. 

Negative: While playing games at a 
party, X is unable to wait for signals. 

Burdick apperception (cultural back- Positive: X knows more about etiquette 
ground). than the average child. 

Negative: X knows little about good 
books, magazines, literature and music. 

Attitudes S-A (exaggeration for ap- Positive: When X is confronted with the 
proval). question, ‘‘Did you ever feel that you 
would like to ‘get even’ with another 
person for something they had done?”’ 
X gave yes as an answer. 

Negative: When X is asked to answer the 
following question, ‘‘When you see 
other children fighting do you stop 
them?,”’ he answers that he would. 

Woodworth-Mathews (symptoms of Positive: X thinks people like him as well 
maladjustment). as they do other people. 

Negative: X feels a certain pleasure in 
hurting an animal. 

Stunt parties (honesty)............... Negative: In a game at a party X cheats 
in order to increase his score, where 
prizes are given for the winner. 

Maller (persistence).................. Positive: After twenty minutes of con- 
tinuous effort X is still working simple 
addition problems as rapidly as she or 
he did during the first two minutes of 
work. 

Negative: After twenty minutes of con- 
tinuous effort X has fallen off con- 
siderably in the rate at which simple 
addition problems are being performed. 


! 


Puzzles (honesty)................ .... Negative: When asked to do a pegboard 
puzzle that is not completely solvable 
X faked a solution. 

Stories (persistence)................. Positive: X persists in trying to solve a 


mechanical puzzle longer than the 
average child. 

Negative: X does not persist as long as 
the average child in trying to solve a 
paper and pencil test. 





~- 








Validity in Character Tests 5 


Txst Samp.e Items 

Stories (inhibition).............. '.... Positive: X is allowed to read part of an 
adventure story. Then he is asked to 
either finish the story or count dots to 
make a good score. He counts the 
dots. 

Negative: X is allowed to read part of an 
adventure story. Then he is asked to 
either finish the story or count some 
dots to make a good score. He finishes 
the story. 

Athletic contest (honesty)............ Negative: On an individual athletic con- 
test in hand grips, X cheated in record- 
ing and reporting his own score. 

Puzzles (inhibition).................. Positive: A box of puzzles is placed on 
X’s desk and he is asked not to touch 
them. X keeps from handling them. 

Negative: In a contest in opening a safe, 
X handled the dial before the signal 
was given to begin. 

Self-scoring speed test (honesty)...... Negative: X surreptitiously improves his 
score on a speed test in simple addi- 
tions by adding answers after time is 


called. 
Self-scoring tests of intelligence and Negative: X cheats on an arithmetic test 
achievement—C.E.I. (honesty). when he has the opportunity. 


The six hundred twenty-nine items were jumbled and not identified 
with any test. The judges were asked to rate each item using the 
following scale. 


RATING SCALE 


On the following pages are items revealed about children by tests and case 
studies. Some represent actual conduct, others the pupil’s verbal response 
(what he says or thinks he ought to do), still others what a teacher or adult 
says about the child. 

Assume that each statement tells one thing known to be true of a twelve 
year old boy or girl, ‘‘X.’’ Place before each one number from the following: 

—3 means: Sign of seriously defective character, very bad. 

—2 means: Sign of defective character; bad. 

—1 means: Slight indication of defective character; more undesirable than 

desirable. 
0 means: No clear significance for character; non-moral. 
? means: Ambiguous; might be good in some cases, bad in others. 
1 means: Slight indication of desirable character; more good than bad. 








6 The Journal of Educational Psychology 


2 means: Sign of good character; definitely desirable. 
3 means: Sign of excellent character; very marked superiority. 


Is it correct to use such judgments (eventually extended to include 
more groups of judges from more varied backgrounds) as indices of 
validity? 

Validity concerns the extent to which a test measures what it 
purports to measure. These tests and scales are supposedly getting 
at important aspects of character. How shall we test the tests? 
“Correlate the results with a criterion!’’ is one common answer. But 
with what criterion? When we seek suitable criteria, we must ask for 
items of behavior or reputation that are generally accepted as represent- 
ing good or bad character. In other words we must do just what this 
study attempts. We must get corapetent persons to tell us what they 
regard as prima facie evidence of goodness or badness. In this study 
we seek such criteria from among the items already included in tests 
and scales. If we find some of these regarded as clear indicators of 
desirable or defective character, we shall have the advantage that 
measures corresponding to our criteria already exist. 

Items given a low rating are not necessarily poor. Maybe certain 
ones can be shown to have a close correlation with the type of item 
here given a high (positive or negative) rating. If so, then these items 
must be accepted as valid to the degree that they correlate with 
acceptable criteria. Items rated high can be regarded as criteria. All 
measures of validity must eventually go back to some such standard. 
Rarely is the standard itself derived and tested as in this study. 

The results may be used in several ways. One outcome is the 
determination of the relative value of certain published tests. The 
average value assigned to all items representing any one test was taken 
for the test as a whole. The values are reported in Table II, not in 
original form, but as decile positions within the total group of thirty- 
one tests appraised. 

Example: The Maller Self-marking Test, presumably measuring 
classroom honesty, was represented in the rating by three typical 
items. If the child responded to such items in the positive or desirable 
fashion, this appealed to the judges as worthy of a rating so high, that 
the test stands in the tenth or top decile among the thirty-one. If 
- the child responded by cheating, this behavior was judged as distinctly 
bad, ninth (or next to highest) decile among the tests in revealing 
badness. Combining the two, we give the test a decile rating of ten. 
The tests are arranged in order of merit according to this column 








Validity in Character Tests 7 


headed ‘‘Combination.” The column labelled ‘‘Agreement of 
Judges”’ is also in deciles and is based on the Q of judges’ ratings. The 
closer the agreement, the higher the decile here recorded. On tests 
represented by more than fifteen items, a measure of agreement among 
the average item-ratings was calculated, and this is given in column 
VI. The higher this decile rating the more homogeneous the items 
appeared in the judgment of this jury. 

It appears that items such as those represented in the Maller 
Self-marking Test, the Coordination Test, the Guess-who rating school 
for reputation among classmates, the Teachers Check List, and the 
Opinion Ballot B (all of these except the first, developed by Hartshorne 
and May in the Character Education Inquiry) are given top place. 
Happily, for educators interested in character testing, each of these 
tests is easily administered. Moreover they tend to supplement one 
another fairly well. The two first mentioned are both conduct tests 
of honesty, but at least two such tests may well be included because 
honesty behavior is regarded as of the essence of character, and because 
the correlation of each with the other is probably not higher than .40. 
This short battery of the most valid tests, by this criterion, could be 
administered to a class and would require about two and one-half to 
three hours of pupil time. 

Table II may also serve as a rough guide in weighting tests used in 
larger batteries. Scores from tests standing near the top may well be 
given more influence than scores from tests near the bottom, if the 
purpose is to get at “‘general all-around character.”’ Of course these 
data cannot show exactly how much weight to give each test. 

Most of the tests show close correspondence between the 
significance of positive and negative responses. The discrepancies 
appear to have a rational foundation. For example: 

Coordination.—Honesty in this test is strong evidence for good character; 
peeking is not to be so severely condemned. 

Roger’s Test of Personality Adjustment.—Good adjustment may not mean good 
character, but maladjustment is likely to indicate very unsatisfactory character. 

Maller Cooperation.—Working hard for the class is a sign of good character; 
failure to do so may have a variety of explanations. 

Maller Persistence—Keeping the rate of work steady although the task is 


wearing is a sign of good character; showing fatigue or lack of interest is not so 
much a sign of bad character. 

Ruggles Distraction.—Being able to resist the distracting jokes and designs is 
approved; yielding to them is regarded as a very trivial matter. 

Attitudes S-A.—Exaggerating one’s goodness is disapproved; admission of 
one’s faults is regarded, however, as a very minor virtue. 





j 
. 





8 The Journal of Educational Psychology 





TasBLeE II.—ReELATIVE CHARACTER SATURATION OF TESTS, AS DETERMINED FROM 
RaTING oF SAMPLE ITEMS 


(Ratings Are in Deciles, with Ten as High) 





















































ST eG aéadcceevaudcdkvenes causes I II III IV Vv VI 
No. of Posi- Nega- Com- Agree- | Agree- 
Test items tive tive bina- | ment of | ment of 
rated value value tion judges items 

Maller self-marking (honesty)......... 3 10 9 10 7 
Coordination, peeking (honesty)...... 5 10 7 10 4 
Guess who (reputation)............... 38 10 9 10 7 5 
Check list (teacher rating)............ 10 8 10 i) 9 
Opinion ballot B. (moral knowledge). . 17 9 6 9 5 3 
Good citizenship (moral knowledge)... . 10 9 7 s 4 
Forlano (items from case studies)...... 17 8 6 8 5 1 
Rogers (personality adjustment)....... 10 4 9 7 2 
Forlano (adult characterizations)...... 32 6 5 7 6 8 
Questionnaire for mothers............. 17 6 5 7 4 4 
Ee re 10 6 5 6 9 
Conduct record (teacher’s rating)...... 110 5 4 6 5 6 
Maller (cooperation for class).......... 4 7 2 5 2 
Maller (persistence).................. 2 7 2 5 1 
Pussies Gubibition)..........c.cce00. 3 5 4 5 10 
Stories (persistence).................. 8 5 4 5 9 
Ruggles (distraction)................. 2 7 1 4 1 
Hayes (observed classroom behavior)... 96 4 3 4 8 2 
Attitudes S-A (exaggeration for 

hk ol ee Lk oe a 10 2 5 4 1 
Woodworth Mathews (symptoms of 

maladjustment) .....ccccccccccccces 73 3 5 3 6 10 
Stunt party (inhibition).............. 3 4 4 3 1 
Hayes questionnaire (self-report on 

health, home, school, recreation, life 

ES SE SRM eee 95 3 3 3 5 7 
Sweet (personal attitudes: Self-others- 

ideal, and relationships)............ 18 3 3 3 1 4 
Burdick apperception (cultural back- 

CLC. a acne nee 66646000 es 13 1 2 2 2 
Stories (inhibition)................... 4 1 1 2 1 

Tests with Data Only for Positive or Negative Items 

Stunt parties (honesty)............... 1 10 7 
Athletic contest (honesty)............. 4 9 7 
Puzzles (honesty).................06- 2 8 7 
Self-scoring speed test (honesty)....... 6 av 8 10 
PE IID, go ce ccceancwrccenccs 1 7 7 
Self scoring tests of intelligence and 

achievement—CEI (honesty)........ 5 7 9 








Validity in Character Tests 9 


Another method for analyzing the ratings, is to select items of 
exceptional merit, disregarding the tests in their present form, and 
looking forward to a new composite test made up exclusively of the 
most valid items. Results are presented in Table III. Items are 
arranged in order of their significance, those furthest from zero (zero 
meaning: No clear value for character) standing first in the list. 
Positive items (7.e. those showing presumably good behavior) are 
listed first, then negative items, for the two scales may not be logically 
comparable. We cannot be sure that a “badness” rating of —2.5 is 
equivalent, for testing, to a “‘goodness”’ rating of +2.5. 

The diverse items of Table III can be said to emphasize a few gen- 
eral aspects of character. These are discussed here mainly as a basis 
for making better character tests, but it is clear that they also set 
objectives regarded as valid for character education. Both methods 
of character training and methods of measurement must go back to 
what the enlightened and influential public regard as sound objectives. 
| Reading through the best twenty-five positive (z.e. “‘good char- 

acter’’) items, we first notice how often a reputation for reliability, 
trustworthiness and dependability has been singled out by the judges 
for emphasis. (See Table III, Items 2, 3, 5, 6, and 7.) The same 
trait name can be applied to the behavior rated high in Items 10, 14, 
15, 16, 18, 20. Whatever the changes in values being brought about 
by the shifting social scene, there is no decrease in the demand for 
dependability and integrity. ‘‘ Men whom the spoils of office cannot 
buy” are needed in every form of government and economics. But 
can we do better than use reputation? Are tests possible? Despite 
the creative contributions of Voelker, Raubenheimer, Hartshorne, 
May, Maller and others, we cannot yet feel confidence that our 
classroom tests of honest behavior give an adequate sample of the 
strains of modern business and finance. It is worth noting that as 
teaching methods improve, less and less emphasis is placed on marks 
and grades, hence the school cheating tests offer less temptation to 
children, and are correspondingly of less value in showing how well 
honesty will stand up under real strain. This progress in educational 
method is welcome, even if it does take the vitality out of our tests. 

How are we to get improved measures of this essential dependability? 
Our suggestion would be that the first need is to relate tests to a sound 
picture of the psychological situation in the life of the child. 'We must 
be sure that the test calls for him to choose between an act leading to a 
strongly desired personal gain, and one which is personally and tem- 








10 The Journal of Educational Psychology 


TasLe III.—Most Vauip Items ror GENERAL CHARACTER 





Judges’ | Rat- 

















mam median | ings Q 
Positive 
1. X does a piece of original, creative work vf own accord................... 3.26 .46 
2. X is regarded by others as dependable, honest, social-minded, and enterpris- 
GM. oan ee Codie eink Lhe ae aes eee sea oeeesehbnndedsaeséanens 3.16 | .52 
Re es ees D trade akan ee sn enssansedeeade 3.13 . 52 
4. X has a reputation among his classmates for being always ready to work or 
play with the rest, even when he can’t have his own way................. 3.07 . 54 
5. An adult observer thinks X is invariably honest, even at a sacrifice; protects 
EEE EEA SE SOPOT, OT ROTATE 2.94 | .58 
6. An adult observer thinks X can be placed entirely on own responsibility .. | 2.94 | .61 
7. An adult observer thinks X is thoroughly dependable; always keeps word 
SES SEES OP NF Py eee eee 2.93 .58 
I i i ee ee a ao 2.93 .65 
9. On paper and pencil test, X is confronted with the following situation. A 
boy is being teased by some other children. X thinks that the most sensible | 
and useful thing to do would be to prepare something everybody can play, go | 
be cu sb eeenee anes seceneeansasesbaceus 2.92 . 55 
EE eT ee ee a 2.91 .55 
11. X has a reputation among his classmates for being the best all-around student 
Neen et ee et ee sea dbaeereenee ke 2.88 | .64 
12. X has a reputation among his classmates for being decent and clean in all his 
EE ee ee rE eT eee re 2.83 . 54 
13. The majority of the class voted that X is someone who is always doing little 
es od care delsesees oaehee sbies Wadbidewn «oes 2.83 . 54 
14. Given a task to accomplish with eyes closed, although it leads to failure, X 
Tee en oc ks ce seas eheed cae ound 600s ONS 2.82 64 
15. The majority of the class voted that X can always be trusted to tell the truth 
TS EE a 2.82 . 66 
16. X in scoring (his) or (her) own paper reported the correct total........... 2.77 .58 
17. X is liked by other children and has many friends....................... 2.76 .49 
18. X is asked to suppose that, ‘‘ James picked up a library book that some child : 
hadlost. It was a book he wanted toown.” X believes that the most useful 
and sensible thing to do is to take it to the library....................... 2.76 . 63 
F 19. X has a reputation among his classmates for obeying parents and teachers, ;, 
keeping all rules and trying ¢v do the right think all the time............. 2.74 .61 
20. X believes that he would rather wear his shabby clothes than steal money to ‘ 
s EI AE SCORE, ON, ES a a eR Ea 2.74 ; 61 
21. X has a reputation among his classmates for always doing little things to | ; 
ri A een de eb 6bbe Sb uewensecneaus 2.72 .52 
22. X has a reputation among his classmates for never being too busy to help , 
PLES PE PEE EEE PEO POOP EEE 2.72 .55 ; 
23. An adult observer thinks X is absolutely fair; leads others to fair play...... 2.70 . 54 ; 
24. An adult observer thinks X sticks to a task until finished even if disagreeable 
EI TT LL RAD yA ED Pee a ae 2.70 .56 - 
25. X works much harder at simple addition problems when the achievement 
counts for the class than when the achievement counts for (himself) (herself).| 2.68 .67 


























Validity in Character Tests 11 
TaBLeE III.—Continued 
Judges’ | Rat- 
item median | ings Q 
Negative 
1. The majority of the class voted that X is a very selfish person who will not 
IR in cikdidc nocd cede cecscccadbeasscccicccesesss —2.59 .38 
2. X is regarded by others as a wicked criminal and reprobate............... —2.26 .29 
eo cconccupibéudebedbadeeePhedeshshedheteseeseeaes —2.00 . 56 
4. X has a reputation among his classmates for telling disgusting stories and 
i tn 0 cis outed era eeeesee beeeebsdesedesessanneses —1.87 . 58 
5. X believes the following statement to be true “‘ The main thing in this world 
ee ee er ct deb esé debs wdnebdesedtéceccescasees —1.85 .61 
6. X is that type of boy or girl who would like to have no friends............ —1.78 . 64 
7. An adult observer thinks X is wholly untrustworthy..................... —1.75 . 67 
8. X in scoring (his) or (her) own paper which was actually only 40 per cent 
right, reported it was 90 per cent right............... 0. cece eee eee eens —1.65 .50 
9. X has a reputation among his classmates for not being trustworthy about 
LS ot>. 55 6g ie Shaadi ekwsk cake cddedbidenddessatidutede —1.61 .48 
10. An adult observer thinks X is utterly without sense of obligation; faithless...| —1.61 .58 
11. X is regarded by others as vulgar, rude, discourteous and indiscreet......... —1.59 . 54 
12. X has a reputation among his classmates for not obeying any rule if he can 
EE SE PE: ee ee —1.58 . 54 
13. X is asked to suppose the following situation, “if another child accidentally 
breaks one of your toys"’; the most useful and sensible think to do X believes 
is to break one of his in order to be even...... 2.2... cee eee eens —1.53 .62 
14. X is regarded by adults as impetuous, arrogant and double-faced.......... —1.52 . 57 
15. X feels a certain pleasure in hurting an animal.......................... —1.51 .59 
16. X has a reputation among his classmates for crabbing and knocking. He is 
ss ec ccc esagaeeneeieescectedseseonace —1.49 .48 
17. In a game at a party X cheats in order to increase his score, where prizes are 
I ES EE Fe Ere Pe Pee —1.49 .55 
18. X has a reputation among his classmates for always picking on others and 
MeL. . weber d eG tbns Senden bes dhe sets 6dedesedeaeesGoseces —1.48 . 57 
19. X feels a certain pleasure in hurting a person....................e0e000- —1.48 .52 
20. X believes that a lot of money, cleverness, a little trickery, a lot of property 
eS ins sci n a Walia We weceg ee edeeoebebdeeeedéeeseess —1.47 .71 
21. An adult observer thinks X is totally blind to the consequences of his acts on 
tt he. nets ene ede dnd CRbehasieenseeseskbnacsnenaeae —1.46 .65 
22. An adult observer thinks X is bull-headed, unconvincible, will not even hear 
oink cans des bwkeee séedetbes sab<eseseee ome —1.46 . 76 
23. An adult observer thinks X wants what he wants at once, and howls or makes 
ee accom daccddied Joba 6e00es bevedecesbocsdedece —1.43 . 69 
24. X says he likes to tease people until they cry................ccccecccees —1.43 .58 
25. An adult observer thinks X is not reliable; must be watched; takes advantage 
SN Oi66 ada Chad tnées oo6s eb boudbbebss 6666 5b606066006 00060004006 —1.42 .50 

















12 The Journal of Educational Psychology 


porarily hard to bear, but which contributes to group welfare. This is 
the algebra of it. The concrete content will vary at different ages 
and with children from different cultures. The essential is to keep 
close to genuinely powerful desires on the one hand, and to genuinely 
valuable social consequences on the other. Existing tests usually 
present a conflict that is too trivial, between drives that are too feeble 
and values that are too artificial. A good series of tests of genuinely 
dependable, reliable, honest, socially-minded behavior, based on 
major urges at each age level, is probably the most needed development 
in character testing. 

Cooperation is a second factor rated high, in this study, as an 
index of character. (See Table III, Items 4, 9, 13, 19, 21, 22, 25.) 
There are at least two distinct ideas in cooperation. One is fitting in 
with other people’s ideas, the other is making your own contribution to 
a social problem. Most of the items in existing scales are closer to 
the conformity idea. But is not an essential value of cooperation lost, 
if it is regarded and is tested as willingness to accept and to work out 
someone else’s plan (pictures for hospital children, arithmetic sums 
contributed for a class contest, etc.)? Moral knowledge tests like- 
wise commonly emphasize adoption of a standard set by society. 
Some few items do give greater emphasis to devising new and better 
solutions. In Item 9 (Table III) for instance, the superior character is 
shown not by refraining from teasing, or by admonishing sinners, 
but by inventing a game which will change the social situation. 
This seems properly rated higher than Item 19 which represents only 
conformity to rules made by others. In a period in which more plan- 
ning is called for—family planning, community planning, economic 
planning—should we not work out more tests which will show the 
capacity of pupils to face a difficulty and contribute good ideas toward 
solution? This is close to the objective of ‘‘integration of values”’ 
set forth in the N. E. A. Department of Superintendence Yearbook on 
Character Education. It is really near the heart of the democratic 
ideal. } 

It is worthy of note that the item most consistently rated high in 
the whole questionnaire is the one which refers to spontaneous creative 
work (Item 1, Table III). Here is something apparently regarded as 
conspicuous evidence of excellent character, but no one has yet 
attempted, so far as our knowledge goes, to construct a test for such 
behavior. Every teacher has opportunity to see this phase of char- 
acter superiority in action. Which pupils do undertake original 
































Validity in Character Tests 13 


creative work on their own accord? Conduct reports might well give 
more space to a few sentences describing such contributions and less 
space to the traditional marks in industry, promptness and other 
lesser virtues. 

A third evidence of desirable character emphasized by these judges 
from among the many suggestions is ‘‘ being well liked by the group.” 
(See Table III, Items 8, 11, 17.) Popularity or acceptability can 
usually be measured in valid fashion by the vote of classmates. 

Turning now to the negative items, (latter part of Table III), those 
which show badness in indisputable form, we see that most of these 
are reputation matters. Most weight is given by these judges to the 
opinion of teachers or classmates that X is unreliable (7, 9, 10, 25), 
vulgar (4, 11), selfish (1), wicked (2), disobedient (12), two-faced (14), 
complaining (16), or a bully (17). It is not hard to identify clear cases 
on these points through ratings. 

Among the forms of conduct rated as worst are cruelty (15, 19, 24), 
cheating (8, 17), and stealing (3). Some tests of these behaviors have 
been developed, and others could readily be added. The heavy con- 
demnation laid by the judges on enjoyment of suffering in others, 
suggests that a graded battery of tests in this area is needed. Landis, 
it will be remembered, has recorded emotional symptoms while subjects 
were butchering a white rat, but the range of consideration for others 
versus hard-boiled inhumanity has hardly been explored experimentally. 

Tests of moral attitude that reveal a tendency toward “ getting by,”’ 
toward getting even, toward wealth and “pull” as success ideals, are 
found important. There is a splendid opportunity to express the 
awakening social conscience of this generation in tests and teaching 
which will help pupils break with the accepted success morality of the 
past. 

Item 6 presents the negative side of the popularity pattern occurring 
among the positive items. 

Summarizing for Table III, and stating each factor positively, we 
may say that teaching and tests are called for, which will show: 


1. Trustworthiness, reliability, dependability in the face of strong temptation. 


2. Ability to cooperate, not merely by accepting the plans of others but also 
by adding creatively to them. 


3. Acceptability in the group. 
4. Kindness, consideration for the feelings of others. 


A study of items rated low in validity is instructive. It is possible, 
of course, for a test-maker to use such items and to demonstrate that 














aa lel SY Btn hee IE 5s So 





Ee eae 


mien ie, ke ae 


~s 


—_— os 





ae +: Senrwe) — aee 





, 





es 


14 


The Journal of Educational Psychology 


TaBLeE IV.—ItTems APPARENTLY LITTLE RELATED TO CHARACTER 




















Judges’ | Ratings 
Item median Q 
Positive 
1. X finds most of a (her) likes and dislikes nearer to (his) (her) personal 
ideal than are the likes and dislikes which X thinks other children have. . .04 .96 
2. X has spent two years in ONO Mrade..... 2... cc cc ccccccccccccccccscess .06 . 63 
Do Ae GUD I Ie acco cer cscs ccccescasecccccccceucsecs ‘ne o6 ss .07 58 
4. X is confronted with the following situation: ‘“‘Joe is a leader. All the 
fellows do what he tellsthem.”’ Regarding this statement X believes that 
he is sure that he is not like Joe and secondly he is just as sure that he would 
ee hi. an kbd e meen hAKee eee Ree Nee doadd .09 .97 
B. FI a ws ec cc cc cccccccccens cases cccccesccccccccccccccoses .10 62 
6. X is ensuing, money i Cee. cw ccsvercecntsossveee .12 1.00 
7. X does not have an allowance or spending money..................+++: 13 .58 
8. Some kinds of food make X sick, so he says.........-..--.022+-eeeeee .13 . 56 
9. X has a very poor idea of what constitutes good home furnishings....... .14 . 58 
10. X sits at desk, fingering object, such as pencil or pencil box............. .14 .51 
11. Mother describes the habitual behavior of X as Dcieesscrseebobsane .15 .72 
12. X answers question of teacher pecty a eee aa et i eh Oe ik cal .16 . 59 
13. An adult observer thinks X can keep only secrets of great importance when 
Tenn ene eesewaneeeee .17 1.10 
14. X says that things often get misty before his eyes....................+. .17 .59 
15. X joins a free group without participating in conversation of group...... .19 . 68 
is ae es Gt I IN GUO, , cece sccccccecsnccssenasseese .20 .55 
17. X comes in or goes out of classroom alone, not speaking toanyone. Quietly .22 .78 
18. X does not go to a Sunday school or some other religious school......... .22 . 54 
19. A pupil comes to school with his broken arminasling. X thinks that the 
most sagen and useful thing to do would be to advise him not to come 
around the fellows, and telling him that he'll sure be hurt.............. .23 1.03 
a a ee ances eeeseescoeeeess .24 .53 
21. X has not planned what he will do when he grows up.................. .24 .53 
22. X says, ‘“‘I don’t know”’ when called on to recite.................0.055 .25 .57 
anon i iy i a aS .25 .49 
ie ee es es Sn, savin cccenpeceesecscdeccesons .28 . 54 
25. An adult observer thinks he is equally balanced between reason and 
ei eS i a os oak pian éeban ben see .28 .95 
Negative 
26. X looks through a book in apparently aimless manner—fiuttering pages...| — .02 . 58 
ee eee bcsendeetvesesenéthateone — .04 .75 
28. After twenty minutes of continuous effort X has fallen off considerably in 
the rate at which simple addition problems are being performed..:....... — .04 . 63 
29. X knows less about etiquette than the average child................... — .05 .59 
30. X does something, but less than the average child, in making scrapbooks 
ee De et ee kk ia a kas Mane e602 * — .06 .85 
31. An adult observer thinks X will work when urged. Loves his ease...... — .06 84 
32. When X is asked, *‘ Did you ever accept the credit or honor for aren 
when you knew the credit or honor belonged to someone else?””’ X 
answered this question in the affirmative.......................e0005- — .07 1.32 
ey ee en bo sso ogc ss Web cues seocesopeeeceies — .07 1.01 
34. X complains about some school condition. (Example: The kind of gym 
eh 2. Sea sc aeehs bobs 0 buss és dhs tAwae es hs Sb — .07 .64 
os tanh etengs oupeeesaseasseiaae> — .08 . 57 
36. X is considerably slowed down in the accomplishment of simple arithmetic, 
if instead of a plain sheet of paper, the margins are filled by distracting 
EE NEES Be PE Ey: — .09 . 66 
37. X reads part of a story. He is given the rest of it with the words run 
EE... sewage eesaccgaseneepaeeeenese —.10 . 68 
3%. X is the type of person who respects proper authority only. (Adult report) -—.1l 1.21 
39. X is punctual and careful in all undertakings and for this reason prides 
ga (herself) as the best child in the house. (Adult report)........ —.11 1.07 
49. feels that he knows little about some specified desirable topics and s 
great deal about some specified undesirable topics..................... —.11 .81 
OE, ie nO i I IOI oon onc cnccc cases coc cacewadesvcesec«s —.14 .61 
42. X thinks that he likes very much to be a good talker but thinks that most 
boys don’t care. X thinks that in that case he ought to feel that he dis- 
ie ae i Aaa ea aR ge Ap NE: —.15 71 
tee a eee ee nedecebeweees ee esa — .16 . 87 
yy op enacting: Eg Pa anda psey gue SRE RE GEIS Caper — .16 .69 
45. Anadult observer thinks X is indifferent as to whether or not he works with 
Di tite n seen thaddacell Asati ntintnhehbind ob eta be eas We4 oes — .16 .69 
45. X knows little about good books, magazines, literature and music........ —.17 .76 
47. X says he cannot stand the sight of blood......................-2200: —.18 . 56 
48. X says he has the same dream over and over............-..-.+0++ee00: —.19 -72 
49. X is not at all times well oriented. (Adult report).................... — .20 .61 
50. X writes aimlessly on board or drawing, before class.................-. — .20 .55 














Validity in Character Tests 15 


they are, in spite of appearances, important indicators of character. 
If he does so, however, he will have to defend them before a jury of 
educators now disposed to regard such items as irrelevant or 
insignificant. 

It is of interest to note in Table IV that these judges would regard 
many of the symptom questions popularized by Woodworth, 
Thurstone, Laird, Bernreuter and others as irrelevant for character 
(e.g. Table IV, Items 5, 8, 14, 20, 24, 33, 35, 43, 44, 47, 48, 49). 

Some of the typical tests of inhibition and self-control are similarly 
regarded as having little bearing on character (10, 11, 23, 26, 50). 
Items 28 and 37 suggest that persistence tests which Maller, Hartshorne 
and May included in their battery, are not thought by these judges to 
merit much weight. 

Home background is not in itself evidence of character: (E.g. 
positive: 3, 6, 7, 9, 29, 46). Religious training is rejected as in itself 
showing anything about character (Item 18). 

Most of the school virtues, however over-emphasized they may be 
on report cards today, are said by these judges to tell us little about 
character. Thus promotion (2); thrift (6); sitting still (10); keeping 
quiet (11, 17) ; reciting well (12, 22); taking part (15); sitting up straight 
(16); having a life goal (21); showing interest in class (23); concentra- 
tion (26); effort (28, 31); punctuality (39); careful preparation of 
assignments (41); are stressed by many teachers, but cannot be 
considered as having any general or widely recognized character 
significance. 

Table IV thus suggests that in character testing, less attention 
should be given to school virtues, home culture, persistence, inhibition, 
self-control, and the ordinary run of neurotic symptoms embodied in 
the personality questionnaires. There is confirmation of this inter- 
pretation in the fact that none of these were represented by items in the 
“high” ratings (Table ITI). 

In final summary and conclusion: 

1. Judged by the type of item in the tesi, the most obviously 
valid and acceptable battery of existing character measures for children 
might well include: (a) The Maller self-marking test, (b) coordination 
test, (c) guess who reports from classmates, (d) check list from teacher, 
(e) opinion ballot, or good citizenship test. 

2. Less emphasis should be given to measures of (a) neurotic 
symptoms, (b) inhibition, self-control, (c) home background and (d) to 








H 
: 
" 
) 
| 





16 The Journal of Educational Psychology 


reports on academic deportment except as these can be shown to 
correlate highly with other more valid indices of character. 

3. More emphasis should be given to developing better tests and 
better teaching methods which will aid in developing (a) trustworthi- 
ness despite strong temptation, (b) creative cooperation as contrasted 
with mere conformity to the group, (c) popularity, and (d) considera- 
tion for the feelings of others. 





A PICTURE PERFORMANCE SCALE! 


JACOB TUCKMAN 
Hebrew Sheltering Guardian Society, Pleasantville, New York 


Performance scales and tests have arisen, primarily, as a supple- 
ment to, and, secondarily, as a substitute for individual and group 
scales and tests which are dependent chiefly upon language or verbal 
ability. As a supplementary measure, the existing performance tests 
and scales have demonstrated their usefulness in the testing of the 
deaf, foreigners, and other groups suffering from a language disability. 
But, as a substitute, they have met with little success because they 
possess many serious inherent limitations and because other verbal 
tests cover the same ground more adequately. An examination of the 
outstanding and representative performance scales reveals the follow- 
ing difficulties and objections which have hindered their widespread 
use: Cumbersomeness, excessive cost, extended period of training for 
their proper administration, difficulty of scoring, restricted range, 
coachability of sub-tests in which puzzle element is present, difficulty 
of securing additional forms, lack of interest for subjects of all ages, etc. 

The scale of performance tests to be described in this paper was 
constructed in an attempt to overcome the objections and difficulties 
presented above. It involves the ability on the part of the subject 
to perceive the relation between the parts of an action or episode to the 
completed action or episode which consists of a definite and correct 
order or sequence. The scale is somewhat similar to the Healy 
Picture Completion I? and II* in content and to the Ship Test of the 
Pintner-Paterson Scale of Performance Tests‘ in procedure; but differs 
from the former in that no choice or alternative is permitted and no 
suggestion is given as to the previous action or event, and differs from 
the latter in that each part is a separate action or event. The subject 
is concerned with the placement of these parts into their correct 





1 1 am indebted to Mr. Requa W. Bell, Superintendent of Schools, Pleasantville, 
N. Y., for his interest and permission to conduct this study; and to Miss E. Myrtle 
Baker, Principal of the West Side School, for her cooperation and assistance in 
making available the subjects for this experiment. 

* Healy, W.: “‘A Pictorial Completion Test.’”’ Psychological Review, 1914, pp. 
189-203. 

* Healy, W.: ‘‘ Pictorial Completion Test II.” Journal of Applied Psychology, 
1921, pp. 225-239. 


‘ Pintner, R. and Paterson, D. G.: A Scale of Performance Tests. D. Appleton 
& Co., 1917, pp. 58-61. 


17 





18 The Journal of Educational Psychology 


order or sequence after the action, episode, or series of events have been 
broken up into their several parts and presented in a random order. 


SUBJECTS 


This preliminary report covers the results on one hundred fourteen 
subjects—sixty-eight boys and forty-six girls—who ranged in grade 
from the Kindergarten through the eighth grade of an elementary 
school in a town in the suburbs of New York City. The great major- 
ity of the children were of American nativity but about twenty per 
cent—-tes. boys and twelve girls—were of Italian parentage. In 
general, the children of the Italian group were of lower socio-economic 
status than those of native parentage. The children ranged in age 
from five years, six months to seventeen years, four months and in 
1Q from sixty-eight to one hundred thirty-two. The mean chronolog- 
ical age was ten years, three months and the mean IQ was one hundred 
six. Ratings on one or more group intelligence tests were available for 
each subject. 


MATERIALS 


The scale consists of thirty-seven tests, of which three have been 
used as sample tests to familiarize the subject with the procedure, and 
to give the subject an idea of what is required. With the exception of 
one sample, the tests are cartoons or drawings which have appeared in 
The New Yorker at various times during the period, March, 1930 to 
February, 1933. One of the cartoons was converted into two sample 
tests, while a free-hand drawing was used as a third. Of the thirty- 
five cartoons taken from The New Yorker, thirty are drawings by O. 
Soglow, two by Fruech, and one by Don Herold, Rea Irvin, and I. 
Klein, respectively. The drawings of O. Soglow which are con- 
cerned chiefly with incidents and happenings in the life of a Little King 
are very interesting and a source of tremendous satisfaction to both 
adults and children because of the infantile personality and behavior 
of the Little King. The drawings of the other artists are also interest- 
ing and represent episodes with unexpected or surprise conclusions but 
do not possess the infantile quality to the extent as portrayed in O. 
Soglow’s cartoons. In the selection of the tests, only those cartoons 
were chosen which could be given without the use of language and in 
which there was some relationship between the various parts of the 
action. Several excellent pictures of a more sophisticated nature had 
to be discarded. 





oa ts. 4 fee fe «24 A 


1 
] 
¢ 
f 
I 
f 








A Picture Performance Scale 19 


The cartoons range from two to twelve parts which vary in size 
from one and one-half by two and one-fourth inches to two and one-half 
by seveninches. Of the thirty-seven drawings, two (sample drawings) 
consist of two parts, five (including one sample) of three parts, three 
of four parts, three of five parts, nine of six parts, three of seven parts, 
four of eight parts, three of nine parts, three of ten parts, and two of 
twelve parts. The following gives the order of appearance in the series, 
the number of parts, the artist, the date of publication in The New 
Yorker, and a brief description of the various parts of four drawings 


as originally published: (These four drawings were selected at random 
from the entire series.) 


Test 9.—Five parts—0. Soglow—1/2/32. 

Subordinate salutes and addresses officer. Officer salutes and addresses Little 
King. Little King runs, followed by officer. Little King, followed by officer, 
slides down fire pole. Little King is riding on fire engine with other firemen. 

Test 16.—Six parts—0. Soglow—7/16/32. 

Little King approaches window. Little King looks out of window. Flagpole 
sitter is seen. Little King turns from window and appears to be thinking. Little 
King talks to lackey. Little King is sitting in an easy chair on top of flagpole. 

Test 23.—Eight parts—0. Soglow—4/10/32. 

Little King approaches with escort. Little King is with escort in a box at a 
baseball field. Little King throws first ball of the season. Fielder gets ready to 
catch ball. Ball goes over fielder’s head. Fielder runs after ball which is still 


beyond his reach and it goes over the fence. Ball breaks store window. Little 
King runs home. 


Test 34.—Twelve parts—0. Soglow—7/30/32. 

Little King, ready to go golfing, is walking on the links. Lackey hands Little 
King his club. Little King is ready to tee off. King hits the ball. Little King, 
lackey, and caddy are walking. All look for the ball. Little King informs 
lackey. Lackey runsforaid. Lackey tells guardsman. Bugler calls out regiment. 


Regiment runs at top speed to the golf links. The regiment joins in the search for 
the golf ball. 


PROCEDURE 


The parts of each drawing were mounted on cardboard and then 
wrapped in colorless cellophane to prevent soiling and mutilation. In 
several cartoons where language was used, the print was deleted. 
Each cartoon was kept in a separate envelope, in the lower left-hand 
corner of which was placed a number from one to thirty-seven which 
indicated the order in which the cartoon was to be presented to the 
subject. In the center of the envelope was placed a number or num- 
bers which indicated the number of parts of which the cartoon consisted 
and the manner in which the cartoon was to be presented. For 








20 The Journal of Educational Psychology 


example, Test 1 has the number “‘3”’ in the center of the envelope and 
this means that the cartoon is made up of three parts and that all three 
parts are to be placed before the subject in one horizontalrow. Test 
11 is marked “3-3” and this indicates that the cartoon consists of 
six parts and that three parts are to be placed before the subject in one 
horizontal row while the remaining three parts are to be placed in a 
row directly under the first row. It was necessary to adopt this proce- 
dure as all the parts could not always be conveniently presented in one 
row on a desk of average size. On the reverse side of each part of 
each cartoon were placed two numbers. The number in the upper 
right-hand corner indicates the correct or actual position of that 
part in the entire series. The number in the center indicates its posi- 
tion in the row or rows put before the subject. If a cartoon consisted 
of six parts, the parts were numbered from one to six in an arbitrary 
fashion and were always presented in that order. 


DIRECTIONS FOR ADMINISTERING 


The examiner should sit opposite the subject. Any table or desk is 
suitable but a lower table or desk should be used with younger children. 
The following directions were given: ‘‘Today we are going to play a 
game.’ Present Sample A to subject in the correct order (as originally 
published), z.e., following the numbers in the upper right-hand corner of 
the reverse side; and then say while pointing to Part 1: ‘‘ Here the boy 
is getting ready to throw the ball.’”’ Then point to Part 2 and say: 
“‘Here the boy is throwing the ball.”” Then reverse order of parts and 
say: ‘‘This is not the right way because the boy has to hold the ball 
before he throws it.’”” Then again indicate the correct order. 

Then take Sample B, present in correct order, point to the draw- 
ings and say: “‘ Here a man is playing on a drum while the Little King 
is watching him. Then the man gives it to the Little King so that 
he can try it. And in this picture, the Little King tries it himself.’’ 
Put Part 2 after Part 3 and say: “‘ You see, this is not the right way.”’ 
Then again indicate the correct positions. 

Then take Sample C, present in correct order, point to the draw- 
ings and say: “Here is a man pressing the lid of a Jack-in-the-Box 
in this first picture. And in this second part, the Jack-in-the-Box 
jumps out.” Put Part 1 after Part 2 and say: ‘‘ You see, this is not 
the right way,’ and indicate correct positions. 

In Samples A, B, or.C, if subject does not understand, repeat 
directions. 








A Picture Performance Scale 21 


Then present Sample A in the order indicated in the center of the 
pictures and say: “Put these in their right order.” If the order is 
correct, praise the subject, and then present Samples B and C. If 
the order is incorrect, present samples again in the manner described 
above. 

Then present cartoons in the order listed, starting with Test 1, 
etc. Ask subject to close his eyes or to turn around while you arrange 
the cartoon in the order indicated in the center of the picture on the 
reverse side. When this is done, tell subject to open eyes and to put 
the pictures into the correct order, just as he did in the samples. If 
subject does not attempt to place the parts into their correct order 
and indicates that he is satisfied with examiner’s arrangement, say: 
“No, this is not right. You fix them in the right order.” If the 
subject rearranges the parts even though the order is incorrect, no 
comment should be made. After the first cartoon is presented, and 
reacted to, no comment should be made on the following pictures 
even though the subject is satisfied with examiner’s arrangement and 
does not attempt to put them in the proper order. 

As soon as Test i is completed, put it to one side, and set up Test 2 
during which time the subject is asked to close his eyes. When he 
begins work on Test 2, copy on to the scoring sheet the numbers 
in the upper right-hand corner of the pictures as arranged by the 
subject. 

With the exception of a few cases, each child completed the entire 
scale at one sitting. There was no time limit. The time spent on 
each test was kept for several subjects but as there was no positive 
correlation between time and chronological age, grade, or mental age, 
this practice was discontinued and time was kept for the entire scale 
only. The time spent varied from thirty to seventy-five minutes. 
As a rule, younger children spent less time since they seemed less 
critical of their performance than did older children. 


SCORING 


The scale was scored by three different methods. Method 1, 
which makes use of partial credits, is similar to that used in scoring 
the SHIP TEST of the Pintner-Paterson Scale.' The procedure is 
as follows: The numbers on the scoring sheet for a cartoon of six parts 





1 Pintner, R. and Paterson, D. G.: A Scale of Performance Tests. D. Appleton 
& Co., 1917, p. 59. 








=a,” & 


OP a RE rE PR eet ORES FS rene 


22 The Journal of Educational Psychology 


which have been placed in correct sequence, would be 1 2 3 4 5 6. 
Credit is given only to correct sequences of two parts or more. Each 
part of the correct sequence is given a score of one. An additional 
score of one is given for a correct sequence of two parts, two additional 
points are given for a correct sequence of three parts, three additional 
points are given for a correct sequence of four parts, etc. A correct 
sequence of two parts is given a score of 3; a correct sequence of three 
parts is given a score of 5; a correct sequence of four parts is given a 
score of 7, etc. Thus, in the example given above, where the order, 
according to the key, was 1 2 3 4 5 6, a score of 11 is given. Where 
the order is 2 1 3 4 5 6, credit is given only to the correct sequence, 
3 4 5 6, and here the score is7. Where the order is 1 243 56, credit is 
given only to the sequences, 1 2 and 5 6, and the score is 6. Where the 
arrangement is 1 3 2465 or 63 2 51 4, the score is zero. 

Method 2 is not concerned with the giving of partial credits. It 
gives credit only to perfect performances. Each cartoon, regardless 
of the number of parts of which it consisted, was given a score of 1, 
if done entirely correctly. 

Method 3 takes into consideration only deviations from the correct 
positions or arrangement according to the key. The sum of the 
squared deviations of the subject’s performance from the correct 
sequence, that is, according to the key, constitutes the score. Let 
the correct order again be 123456. Where the order is 213456, 
deviations from the correct positions would be the difference between 
1 and 2, or 1; the difference between 2 and 1, or 1; the difference 
between 3 and 3, or 0; the difference between 4 and 4, or 0; the differ- 
ence between 5 and 5, or 0; and the difference between 6 and 6, or 0; 
and the sum of these deviations squared would yield a score of}2. 
Where the order is 4 3 125 6,63 251 4, the deviations would be 
3, 1, 2, 2, 0, 0; 5, 1, 1, 1, 4, 2, respectively, and the sum of the squares 
of the deviations would yield scores of 18 and 48, respectively. By 
this method, a high score indicates a poor performance. Method 3 


is superior in one respect to the other scoring schemes in that a zero 
score is meaningful. 


VALIDITY 


To determine the validity of the scale, three criteria were employed: 
Mental age, chronological age, and school grade. There was available 
for each subject at least one IQ rating and in many cases there were 
twoormore. These ratings were only on the basis of group intelligence 








A Picture Performance Scale 23 


tests. For the Kindergarten grade, the ratings were based on the 
Detroit Kindergarten Test; for Grades I to III, on the Detroit Kinder- 
garten and the Detroit First Grade Tests; for Grades IV to V, on the 
Detroit First and the National Intelligence Tests; and for Grades 
VI to VIII, on the National Intelligence Test. Where more than one 
test had been given, the average IQ was computed. The average IQ 
by grade for the children tested is as follows: 


GRADE IQ 
GD, Ste Ae, ally adie «sd hee aeaieedsns 98.1 
ek i le ee rad os ar el Byte and Sire ally and 105.8 
EEN Oe OS, ee a ne ee ee eee 107.8 
a i 107.9 
eae dk oat Me Mie ie isa a Ma 6 kk a bee dee @.a ae eam 105.8 
a SING 2 I AS NS I we Le Sh Ee eS 112.1 
Ne 6 eye lied tk dd Moe de ceeded ddncebbbbes awd 100.4 
is eile eee ais ee tale easuian oudek 111.4 
EE EE A ee ee eee pee eee are 101.5 


Since the group tests had been given some time previous to the per- 
formance scale, the mental age was obtained by multiplying the IQ 
by the chronological age at the time the performance scale had been 
given. The school where the experiment had been carried on does 
not have the system of half grades and since the testing was done 
toward the end of the school year, all subjects were in the second half 
of the school grade. 

Correlations by the product-moment method were computed 
between score on the basis of Methods 1, 2, and 3 and grade, chrono- 
logical age, and mental age. An attempt was then made to improve 
the scale by eliminating those tests which possessed no diagnostic 
value. An item analysis was made on the basis of scoring Methods 1 
and 2 by calculating the average score on each test for each grade, 
chronological age, and mental age category. Of the thirty-four tests 
in the scale, thirty-three were found to have some diagnostic value 
either at all grade, chronological age, and mental age levels or at parts 
of the range. However, it was decided to retain the one test which 
seemed to have no differentiating value until the scale had been given 
to additional subjects. 

An additional item analysis was made by computing the correla- 
tions between mental age and score on each test as obtained by 
Method 3. A negative correlation would indicate some diagnostic 
value since by Method 3, a high score is indicative of a poor per- 
formance. These correlations ranged from —.07 to —.74. Four 











24 The Journal of Educational Psychology 


tests were discarded when it was decided to eliminate those tests 
which had correlations of less than —.20. When the scale was rescored 
on the basis of the thirty discriminating tests, the results were slightly 
better than those obtained on the basis of the entire scale. The results 
were not improved further when the criterion was set at a correlation 
greater than —.20. 

The correlations between score by Methods 1, 2, and 3 on all 
thirty-four tests and between score by Method 3 alone on thirty tests, 
and grade, chronological age, and mental age appear in Table I. 
The correlations between score by Method 1 on the entire scale and 
grade, chronological age, and mental age are .81, .74, and .85, respec- 
tively, and are higher than those obtained by Methods 2 and 3, and on 
thirty tests by Method 3 alone. In view of the fact that higher 
correlations were obtained between score by Method 1 and our criteria, 
only Method 1 was retained and hereafter our results will be presented 
in terms of this method only. When the details of scoring by this 
method is mastered, the entire scale can be scored in five minutes, or 
each test can be scored while the subject is working on the one follow- 
ing. Although the scale had been given to children from the Kinder- 
garten through the eighth grade, the results indicate that the scale may 
be suitable for subjects from below the Kindergarten through college 
level. Several graduate students who had been tested were not able to 
achieve perfect scores. 


TaBLE I.—CoRRELATIONS BETWEEN ScorE BY MeEtuHops 1, 2, anD 3 ON THIRTY- 
FoUR TESTS AND BETWEEN ScorE BY METHOD 3 ALONE ON THIRTY TESTS, 
AND GRADE, CHRONOLOGICAL AGE, AND MENTAL AGE 





Method 1!| Method 2!| Method 3! | Method 3! 
(34) (34) (34) (30) 





¢ PE r PE r PE r PE 





Score and grade.............. .81 |} .02 | .73 | .03 |—.79| .02 |—.78) .02 
ee .74| .03 | .68 | .03 |—.73) .03 |—.71) .03 
Score and MA................ .85 | .02 | .80 | .02 |—.75| .03 |—.80| .02 





























1 Correlations (r) between score by Method 1 and seore by Method 2, between 
score by Method 1 and score by Method 3, and between score by Method 1 and 
score by Method 3 on the basis of thirty tests are .89, —.94, and —.93, respectively. 


The average score for grade, and for chronological age and mental 
age appear in Tables II and III. The lowest score is thirty-two and 





A Picture Performance Scale 25 


the highest, three hundred forty-one, giving a range of three hundred 
nine points. The maximum score possible is four hundred twenty. 
The mean score is 196.5 with a standard deviation of 83.4 points. The 
average score for grade, chronological age, and mental age, increases 
from sixty-nine for the Kindergarten grade, sixty-seven for a chrono- 
logical age of five, forty-five for a mental age of four, to two hundred 
eighty-one for Grade VIII, two hundred seventy-one for a chrono- 
logical age of seventeen, and three hundred nineteen for a mental age of 
sixteen. 

The evidence presented in Tables II and III is clear in showing that 
there is a progressive increase of score with grade, chronological age, 
and mental age. There are several variations but the trend is unmis- 
takeable and consistent. In TableII,the average score for Grade VI is 


Tasie I].—Averace Score ror GRADE: KINDERGARTEN TO GraDE VIII 
One Hundred Fourteen Subjects 











Grade N} Average score 
RE iE ies aie ie eed bh edkde daeek kd’ 10 69 
i a ia a i rhe alas a ae 12 95 
a a rk eee ied il ees cellicimal 13 140 
ae cn ie eed de etee akon bb es dabeeaa 14 177 
EE dl ulahd- Gud oe Beaks eas wh wed 8 ae ssk eee? 13 214 
I i eS ok SUE Ne seek chee pees 13 239 
icici tbls bate bien b nteates Kei dk Oel ain hw ieouy Mall 13 234 
Cc tcet ech e cus cheb etches cn kdnsbiowtesa 13 276 
ics cre oe ieheas « hed Ved eke eeweduanea 13 281 








1 N represents the number of cases in each grade. 


lower than that for Grade V but the average score for Grades IV and V 
combined is lower than that for Grade VI alone. In Table III, the 
irregularities in the performance of chronological ages fourteen, 
fifteen, sixteen, and seventeen are unreliable because of the small 
number of cases within these age categories. A mental age of seven 
has a higher average score than a mental age of eight, and a mental 
age of eleven shows a better average performance than a mental age of 
twelve; but the average score of mental ages six and seven combined 
and the average score of mental ages ten and eleven combined is 
lower than that of mental ages eight and twelve alone, respec- 
tively. 








erent atta Re 


ume 


26 The Journal of Educational Psychology 


TasLeE II].—Averace Score FoR CHRONOLOGICAL AGE: FIVE TO SEVENTEEN, 
AND FOR MENTAL AGE: Four TO SIXTEEN 
One Hundred Fourteen Subjects 


























Average score for Average score for 
chronological age mental age 
Age 
N Average N Average 

score score 
es tains tas cand a4 deen on it cae 2 45 
ESA es rel 6 67 4 71 
Cy ehnkiet tidah bee eenes 13 91 i) 78 
eh See yk bn cick bh ntiicou wee 10 123 11 126 
iad Jatsae bene wads nad 15 159 11 119 
ES ped Rene Ror ee 11 213 12 176 
SS rare ae eee ee 12 220 13 209 
Gia ab 4 oa bk oe kbs 4 dw 14 246 10 261 
aide dias 40d ead keer et 11 248 11 235 
ie a eek Seles 10 284 16 267 
ls 6 hin 6 a a 0. da eae 4 259 10 274 
tic taweseerhiedecvae 3 270 4 293 
ERR rtp aeer mean 2 253 1 319 

ski tevghahsnceevwwn 2 271 

RELIABILITY 


Since it was not possible to retest any of the subjects in view of the 
fact that the school year was at a close when the preliminary testing 
had been completed, the reliability of the scale was obtained by the 
split-halves method. The scale was first graded into an order of 
difficulty by dividing the average score on each test for the entire group 
by the maximum score obtainable on that test. This value was -ailed 
the difficulty value. Thus, where the average score obtained by the 
group on a test consisting of twelve parts is 16.1 points, the difficulty 
value is secured by dividing 16.1 by 23, maximum score, and equals 
.72. A lower figure indicates a more difficult test. After the tests 
had been arranged into an order of increasing difficulty, the score on 
Tests 1, 4, 5, 8, 9, etc. were correlated with score on Tests 2, 3, 6, 7, 
etc. The reliability co-efficient obtained for the half-test was .88, 
and when corrected by the Spearman-Brown formula became .93, with 
a PE of .01. 








A Picture Performance Scale 27 


ADDITIONAL FORMS 


In terms of Method 1, the scale was divided into two almost 
equivalent forms each consisting of seventeen tests. Tests 1, 4, 
5, 8, 9, etc. were called Form A while Tests 2, 3, 6, 7, etc. were called 
Form B. The average score on Forms A and B for grade, and for 
chronological age and mental age appear in Tables IV and V. 

The trend of average score for grade, chronological age, and mental 
age which was evident in the results for the entire scale is maintained 
in Form A and Form B although the scores on Form A are, in general, 
higher than those on Form B. This is as it should be since the highest 
score possible on Form A is two hundred thirteen points while the 
highest score obtainable on Form B is two hundred seven points. On 
Form A, the lowest score is fourteen and the highest, one hundred 
seventy-six, giving a range of one hundred sixty-two points. The 
mean score is 101.3 with a standard deviation of 44.5 points. The 
average score for grade, chronological age, and mental age increases 
from thirty-two for the Kindergarten Grade, thirty for chronological 
age five, twenty for mental age four; to one hundred forty-three for 
Grade VIII, one hundred thirty-seven for chronological age seventeen, 
and one hundred sixty-seven for mental age sixteen. On Form B, 
there is a range of one hundred fifty-two points, the lowest score being 
eighteen and the highest, one hundred seyenty. The mean score is 
95.8 with a standard deviation of 41.5 points. The average score 
increases from thirty-seven for the Kindergarten Grade, thirty-seven 


TaBLeE IV.—Forms A anp B: AVERAGE ScorRE FOR GRADE: KINDE® 3ARTEN TO 
GrapveE VIII 
One Hundred Fourteen Subjects 











Average score 
Grade N 
Form A | Form B 

sc hecawyesesebapasnes Kbs0tenenndan 10 32 37 
Cs. tceins dene thh wees eh bhaatekdse caweebe 12 50 45 
NE cis ebb caste RAR EOes ss eed Rekien 13 74 66 
a 5.cvnd Onde Keeeeshiessaniecanssndesagaxa 14 | 92 85 
a aks eke dos bh abeetendseekess ons énues 13 108 106 
a bs be nie ee adesneenhde bbennadesaan 13 123 116 
ste 56-5'o.0605406 006 0G604500 60406000008 13 120 114 
CN As 56s daisdds CMaC sabes KUE tbs whh atime 13 140 136 
esp disdekvbhnemnsidascuen niedecsewue 13 143 138 





























eo 
. om 


ed ~ 
a il te eed 


eee ge EB ee 5 
Stacia eee 4 ar Tyros 
a ee 





28 The Journal of Educutional Psychology 


TaBLE V.—Forms A anp B: AVERAGE SCORE FOR CHRONOLOGICAL AGE: FIVE TO 
SEVENTEEN, AND FoR MentaL AGE: Four To SIXTEEN 
One Hundred Fourteen Subjects 





























Average score for Average score for 
chronological age mental age 
Age Average score Average score 
N N 
Form A | Form B Form A | Form B 

eile ic ed orth saies dogs iain te see kas dais — 2 20 25 
Ce einen mabe keua heel 6 30 37 4 32 39 
EE are Eee 13 45 46 9 39 39 
GS. a nn re 10 66 57 11 69 57 
ees Secu tekabev ene 15 82 77 11 65 55 
deel ee ek bee he kane 11 109 104 12 85 91 
Sy a re 12 113 107 13 108 101 
SS a ee 14 129 117 10 132 129 
Es ie i a de ok ed wien 12 125 119 11 119 116 
i ckciseeenes eed aee 10 145 139 16 138 129 
ns etn neereneeeeen’ 4 129 130 10 137 137 
a ar or 3 129 141 4 160 133 
Gy ditis si i ied eens eae 2 130 123 1 167 152 
I is a ah ae debe OeS 2 137 134 











for chronological age five, twenty-five for mental age four; to one hun- 
dred thirty-eight for Grade VIII, one hundred thirty-four for chrono- 
logical age seventeen, and one hundred fifty-two for mental age sixteen. 

Correlations were computed between score on Form A and on Form 
B and the criteria. The correlations between score on Form B and 
grade, chronological age, and mental age are .82, .75, and .84, respec- 
tively, which are higher than the correlations of .77, .71, and .82, 
respectively, obtained for Form A. The correlation between score 
on Form A and score on Form B is .88 with a PE of .01. 


ADVANTAGES OF THE SCALE 


The scale possesses the following advantages: 
1. Ease of administration and scoring—the details of administra- 
tion and scoring can be mastered in an hour. 
' 2. Convenience of handling. 
3. Wide range—there is evidence that the scale is suitable from 
below the Kindergarten grade through college level. 





A Picture Performance Scale 29 


4. Interest—it has been found to be interesting to both children 
and adults. 

5. Additional forms—the type of material used makés possible 
the construction of a large number of equivalent forms. 

No norms have been presented in this preliminary report since 
the scale needs to be administered to a larger group of subjects before 
an adequate standardization can be secured. It is important and 
desirable that individuals be selected from below Kindergarten 
through college level or as far as the scale may be found to be suitable. 
Better intelligence ratings preferably on the basis of the Stanford- 
Binet Scale, should be obtained for younger subjects. 











Ee —  —————— 








pretense oe wee Se eet 
F a, 


MOTT Lhe ae ety 


—— = rf i eee ae 
nae A ANE sa! ou oe ~ 3. 
“< Cat AE Ge. 


AN EXPERIMENTAL STUDY OF THE OLD AND NEW 
TYPES OF EXAMINATION: II. METHODS OF STUDY 


GEORGE MEYER 
Psychological Laboratory, University of Michigan 


INTRODUCTION 


In a previous article! the writer presented the results of a study on 
the effect of the examination set on the memory for a certain type of 
sense material. In that article it was pointed out that the differences 
found among the various examination set groups (true-false, multiple- 
choice, completion, and essay) were probably due to the different 
methods of study used by those groups while learning. It is the 
purpose of this paper to make an analysis of the methods used by the 
four examination set groups. 

These methods of study will be considered from two standpoints, 
one subjective and the other objective. The data for the subjective 
analysis come from the subjects’ answers to some questions given at 
the close of their last study period. The first of these questions was: 
“Tell how you studied for your particular type of examination.”” The 
second was: ‘‘ Did you study differently for this type of examination 
from what you would have if you had been assigned one of the other 
three types of examination? If so, what different methods did you 
use? Be sure you indicate the methods you used for the assigned type 
and the methods you would have used if you had been studying for 
each of the other types.” 

The data for the objective analysis, on the other hand, come from 
an examination of what the subjects actually did during the learning 
periods; 7.e., from a perusal of the notes which the subjects made 
during the learning periods and from an examination of the mimeo- 
graphed booklet containing the learning material. 


RESULTS OF THE SUBJECTIVE ANALYSIS 


The following are some typical excerpts from the subjects’ own 
reports as to how they studied for their particular types of examinations 
and as to how they would have studied had they been assigned one of 
the other types of examinations. 





1 Meyer, George: “An Experimental Study of the Old and New Types of 
Examination: I. The Effect of the Examination Set on Memory.” 
30 





Old and New Types of Examination 31 


Subject 125 (T-F).—I read the material through and underlined names, places 
and general facts. I reread the entire material with the object of learning the 
things that happened and the results. The third time I reread only part of the 
underlined material. . . . Had I studied for a completion test I would have made 
it my business to learn all the names and places and numbers mentioned... . 
However, when studying for the completion, multiple-choice or true-false exami- 
nations I find that TI do not attempt to get a general view of the material—I try 
to learn the facts or memorize the statements. When I study for an essay exami- 
nation, I read and reread the material with the object of getting not only the facts 
but also a general concept of the material. 

Subject 43 (Completion).—In studying for a completion test I usually try to 
underline the most important details such as dates, if there are any, and the names 
of people, places, important causes and results. On the second reading, I try 
to get a more general view of things and bind together the underlined items. If I 
were studying to write an essay I should probably read the material three or four 
times to get the general idea of the given material keeping the subject in one unit. 
For the multiple-choice test I should go about studying in much the same way as 
for the completion test. The same would hold for the true and false test. In 
the essay I should also take notes to keep the events in their proper order. The 
notes would be the mere frame-work around which the essay would be built and 
expanded. 

Subject 18 (M-C).—My method of preparation for a multiple-choice examina- 
tion differed from the method I would use in preparing for an essay type in one 
very noticeable respect. If I had not been preparing for a multiple-choice exami- 
nation I would not have put quite so much emphasis on the details. The essay 
type gives one a great deal more freedom, a great opportunity for personal opinion 
and interpretation. When I studied for the multiple-choice examination I was 
especially prepared to know specific details and the finer points. . . . I can’t say 
very definitely that I would have prepared very differently for a true and false 
type since they are fundamentally alike. I probably would have confined myself 
to more memory work if I were preparing for a “fill-in” examination. I always 
seem to find it easier to take a multiple-choice examination in that the possibilities 
suggested in a question bring definite facts to mind. However, the ‘“‘fill-in” type 
makes it essential that one have the facts on the tip of his tongue. .. . 

Subject 52 (Essay).—For an essay type test, I usually try to fix the general 
outline, the major drift of the subject, in my mind, and then add as many details 
to the general absorption as my time and energy permit. I usually outline the 
material on paper and try to think it through several times. When false and true, 
completion or multiple-choice tests are expected, I concentrate my attention on 
learning details, definitions, words, figures. I stuff my memory with as many 
facts as I think it likely to retain for the required time, until and including the 
test, and then quickly forget every thing except the few points that appealed to 
me as most important. 


An analysis of all of the subjects’ reports shows that: 
1. Of the thirty-one subjects in the true-false group twenty-eight 
or ninety per cent hold that they prepared themselves for this type of 











oS eS le eee 7 





- 
Va 
+4 
Pa: 
Me 
he 
Ma 
Bs 
5 
Ry 
‘4 
i 


$2 gee iad BN ars 
— - — . 


eek S abes pee eS 
' 


32 The Journal of Educational Psychology 


test by studying the details of the material. They claim that they 
would have prepared themselves for either of the other two types of 
objective tests in the same manner, although twelve or forty-three 
per cent of them indicate that they would have made much more effort 
to learn these details if they had been studying for a completion type 
of examination. All of this group of twenty-eight individuals claim 
that they would have tried to organize the material and to get a general 
picture of it if they had been studying for an essay test. Of the thirty- 
one subjects three or ten per cent hold that they prepared themselves 
for this type of test in the same manner as they would have prepared 
for any other type of test. These three claim to have studied in the 
same manner as the other individuals would have studied for an 
essay test. 

2. Of the thirty-one subjects in the multiple-choice group twenty- 
six or eighty-four per cent state that their preparation for this type 
of test involved studying the material for details; that they would have 
used the same method in preparing for a true-false or completion test; 
and that they would have attempted to get a general view of the 
material if studying for an essay test. Of the twenty-six subjects 
eight or thirty-two per cent indicate that they would have tried to 
learn the details more thoroughly if they had been studying for a 
completion test. Of the thirty-one subjects five or sixteen per cent 
say that they studied for this type of examination in the same way as 
they would have prepared for any other type of test, z.e., they would 
use the essay test method for studying for any type of test. 

3. Of the thirty subjects in the completion group twenty-six or 
eighty-seven per cent say that their studying was for details; that they 
would have used the same method for a true-false or multiple-choice 
test; and that they would have studied to obtain a general view of the 
material for an essay test. Of the twenty-six subjects ten or thirty- 
eight per cent hold that they tried to learn the details more thoroughly 
for this type of examination than they would for a true-false or 
multiple-choice examination. Of the thirty subjects four or thirteen 
per cent claim that they used the same method as they would have 
used for any other type, 7.e., the essay test method. 

4. Of the thirty-two subjects in the essay group twenty-seven or 
eighty-four per cent state that they studied for this type of examination 
by attempting to get a general view of the material whereas they would 
have studied the material for details if they had been studying for one 
of the objective types of examination. Of the twenty-seven individuals 





ao fF 


eH TH Gee 


,ei br oki## er tet ef Ar — eae 








er Be Se Fs ESS FT Ue ULTTClCUhrOrlUTlUCUO!UC'S 


“ “ ' 


dl oo il we \wrhU?e. 


we 


— 





Old and New Types of Examination 33 


eleven or forty-one per cent report that they would have studied the 
details more thoroughly for a completion test than for either of the 
other two types of objective examinations. Of the thirty-two sub- 
jects five or sixteen per cent hold that they used the same method in 
studying for this type of examination as they would have used for 
any other type. 


RESULTS OF THE OBJECTIVE ANALYSIS 


The objective analysis of the subjects’ methods of study, as has 
been pointed out, was made by examining the notes which the subjects 
had taken during the learning periods and by examining the mimeo- 
graphed booklets themselves. It was found that the objective 
methods which the subjects used were six in number: (1) The under- 
lining of words, phrases and sentences in the booklet: (2) the listing 
of names, places, dates and numbers; (3) the taking of random notes, 
z.e., notes which had no organization but were more than mere listings 
of names, etc.; (4) the making of summaries in paragraph form; (5) the 
drawing of maps; and (6) the framing of practice test questions. 

Table I gives the number of individuals in each group, the average 
number of the foregoing methods used by each group, the standard 
deviations of each distribution, the standard deviations of each mean, 
and the critical ratios' computed from these data. 


TaBLE I.—CoMPARING THE VARIOUS GROUPS AS TO THE NUMBER OF METHODS OF 
Stupy Usrep 
































Critical ratios 
Group N Mean SD SDs 
Essay | Com- | T-F 
pletion | 

NR his wn ity t 32 2.47 1.20 21 
Completion...... 30 2.27 1.46 .37 .59 
. >a 31 1.78 1.49 . 26 2.09 1.29 
eee 31 2.26 1.39 .25 .64 .03 | —1.33 





The critical ratios in Table I although showing no completely 
reliable differences tend to indicate that the subjects in the true-false 
examination set group do not use so many objective methods of 





1 These critical ratios are a/og. The criterion for a reliable difference used in 
this study is a/ca = 3. 





34 The Journal of Educational Psychology 


study as do the subjects in any other groups. ‘There is not even an 
indication of a true difference among the other groups. 

A slightly different picture with reference to the number of methods 
of study used by the various groups may be obtained by comparing 
(1) the percentages of individuals in the various groups who used no 
objective method of study and (2) the percentages of individuals in the 
various groups who used either no or one objective method. Table 
II gives the percentage of individuals in the various groups who used 
no objective method, the standard deviations of these percentages and 
the critical ratios computed from these data. Table III gives the 
same data together with the critical ratios computed from them for the 
individuals who used either no or one method. 


TasBLE II].—ComparRING THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS Ustne No OsecijtivE Stupy METHOD 











icin | Meine Critical ratios 
Group N | using no | using no SD, . ay 
method method — Essay | Com- Teor 
pletion 

eS 32 3 9 .051 
Completion...... 30 3 10 .055 — .13 
skank Salon 31 10 32 .084 —2.35| —2.20 
ERE 31 3 10 .054 —.14 .00} 2.20 


























Tas_E III.—CompariInG THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS Usina No or ONE OsJeEctivE Stupy METHOD 
































Number | Per cent - Critical ratios 
Group N using using SD, 
O orl OQorl_ =| per cent Com- 
method method Essay pletion T-F 
A 32 = 16 .065 
Completion...... 30 11 | 37 .088 | —1.91 
Er 31 14 42 .089 —2.36| —.40 
eid tralacihiates 31 10 32 .084 —1.51 41 .82 





Table II indicates that there may be a true difference between the 
percentage of individuals in the true-false set group and the percentage 
of individuals in each of the other three examination set groups who 
in studying for their respective types of examinations use no objective 
method of study. In each case the percentage of individuals is greater 








Old and New Types of Examination 


35 


for the true-false group. No indications of a true difference are present 
among the other groups in respect to this matter. 

Table III indicates, however, that for no or one objective method 
there may be a true difference between the essay and each of the other 
groups. In each case the percentage of individuals in the other three 
groups is greater. No indications of a true difference are present 
among the other groups in respect to this matter. 

The methods of study in the various groups have been furtler 
analyzed v ‘th respect to the differences in percentages of individuals 
in those groups who used some particular objective method of study. 
The data on underlining and the critical ratios based on these data’ 
are presented in Table IV. 


Taste IV.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS WHO UNDERLINED THE LEARNING MATERIAL 











Number | Per cent Critical ratios 
Game ui using using SD, 
under- under- | per cent Com- 
lining lining Easay pletion ad 

iii 8s. ays 29 12 41 .091 
Completion... .. .| 27 18 67 .090 | —2.03 
Seer 21 16 76 .093 —2.69| —.70 
| eee 28 17 61 .092 —1.55) .46] 1.15 


























1 N in this and all subsequent tables is the number of individuals who used 
some objective method. 


The critical ratios in Table IV while not meeting our criterion for 
reliability indicate that there may be a true difference between the 
essay group and each of the other three groups with respect to the 
percentage of individuals using underlining as a method of study. 
In each case the percentage is less for the essay group. The only 
other indication of a difference is between the true-false and multiple- 
choice groups, the percentage of cases using underlining being greater 
in the former group. 

The data on the listing of names, dates, numbers, etc., as a method 
of study and the critical ratios based on these data are shown in Table 
V. The critical ratios in this table indicate that there are no statis- 
tically reliable differences nor are there even any indications of true 
differences among the various groups with respect to the percentages 
of individuals who used listing as a method of study. 











36 The Journal of Educational Psychology 


TaBLE V.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS Wuo Usep LisTinG as A MEtTuHop or Srupy 























| Critical ratios 
Number | Per cent SD, 
Group N using using per cent Com- 
listing listing Essay pletion T-F 
ey 29 13 45 .092 
Completion...... 27 13 48 .096 — .23 
_, 2s 21 11 52 .110 —.50 | —.27 
ee 28 11 39 .092 .46 .69 .93 














In Table VI are given the data on the taking of random notes as a 
method of study. The critical ratios based on these data in no 
case indicate statistically reliable differences. These critical ratios 
suggest however that there may be a true difference between the true- 
false group and both the multiple-choice and completion groups, there 
being a greater percentage of cases in the true-false group than in 
either of these two groups who used this method. One other difference 
is indicated—a greater percentage of cases in the essay group used 
this method than in the completion group. None of these differences, 
however, is completely reliable. 


TaBLE VI.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS Wuo UsED THE TAKING OF RANDOM Nores sas A METHOD 











oF StTupy 
Number | Per cent Critical Ratios 
taking taking SD, 
Group N random random | per cent Ess Com- T-F 
notes notes ~ pletion , 
Se 29 13 45 .092 
Completion..... .| 27 7 26 .084 1.52 
Aki hnnekea 21 11 52 .109 —.49 | —1.88 
RT 28 10 36 .091 .70 —.81} 1.13 


























The data on the making of summaries as a method of study and the 
critical ratios based on these data are shown in Table VII. The only 
differences indicated by the critical ratios are between the essay 
group and each of the other three groups. The percentage of indi- 
viduals in the essay group making summaries as a method of study is 








Old and New Types of Examination 


greater than in any of the other three groups. 


not reliable. 


37 


These differences are 


TaBLE VII.—ComparRING THE VARIOUS GROUPS AS TO THE PERCENTAGE 
or INDIVIDUALS WHo Maps SUMMARIES AS A METHOD or StTupy 











Critical ratios 
Number | Per cent 
. SD, 
Group N making making per cent — 
summaries | summaries Essay : T-F 
pletion 
Te baked acai 29 13 45 .092 
Completion...... 27 9 33 .090 .93 
Me ccankd chad 21 5 24 .093 1.60 .70 
ais innit ach 28 9 32 .088 1.02 .08 — .63 


























In Table VIIJ are given the data on the making of maps as a 
method of study. The critical ratios computed from these data are 


also given. 


These critical ratios indicate that there may be a true 
difference between the essay group and each of the other groups. 


The 





percentage of individuals in the essay group making maps as a method 
of study is larger than in any of the other three groups. These 
differences though approaching reliability are not completely reliable. 


TasLeE VIII.—ComparinGa THE VaRiIoUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS WHO Maps Maps as A METHOD or Stupy 


























Critical ratios 
Number | Per cent SD, 
Group N | making making per cent Com- 

maps maps Essay pletion T-F 
EA 29 24 83 .070 
Completion... .. .| 27 14 52 .096 2.63 
Pa 21 12 57 .108 2.01 | —.35 
rere | 28 17 61 .092 1.90 | —.68 | —.28 











The data on the making out of questions as a method of study and 
the critical ratios computed from these data are to be found in Table 
IX. This table shows that the percentage of individuals in the true- 
false group who made out questions as a method of studying is less 
than the percentage in any other group. The difference between the 
percentage of the true-false group and that of the completion group 
The differences approach the criterion of reliability, 


is reliable. 








; sf 


i pee ee 


Ee i ee see 
exe 5 He te = 


- 
—— 








38 The Journal of Educational Psychology 


three, when the percentage of the true-false group is compared with 
either the percentage of the essay group or the percentage of the 
multiple-choice group. Another difference, although one which is 
not very significant, is found between the completion and essay groups. 
In this case the percentage of individuals in the completion group who 
made out questions as a method of study is greater. 


TaBLE [X.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE OF 
INDIVIDUALS WHO Mape Our Questions as A MEtuHop or Stupy 



































Number | Per cent Critical ratios 
G making making SD, 
ale N out out per cent Ess Com- T-F 
questions | questions ay pletion " 
Essay...........| 29 4 14 .064 
Completion...... 27 7 26 .084 | —1.13 
er 21 0 0 .000 | - 2.19} 3.10 
rr 28 6 21 .077 —.70 44 | —2.73 
DISCUSSION 


It would seem that the differences in methods of study found among 
the various groups are enough to account for the differences in test 
results which were indicated in the first article of this series. The 
differences in test results between the essay group and each of the 
recognition groups may possibly be explained by the following differ- 
ences in methods of study. In the first place the reports of the 
subjects in the essay group indicated that they studied to get a general 
view of the material whereas the reports of the subjects in the true- 
false and multiple-choice groups indicated that they studied to get 
details only. Secondly, the subjects in the essay group used more 
objective methods of study than the subjects in the true-false group; 
and a smaller percentage of subjects in the essay group used no or just 
one objective method than in the true-false or multiple-choice groups. 
Thirdly, as far as specific objective methods of study are concerned, 
a smaller percentage of individuals in the essay group underlined parts 
of the learning material, a greater percentage made summaries, and a 
greater percentage made maps than in either the true-false or multiple- 
choice groups. This probably indicates that underlining is a method 
which is used in studying material for details whereas the making of 
summaries and maps are methods used to organize material which is 
being learned in order to obtain a general picture of it. 





Old and New Types of Examination 39 


The differences between the completion group and each of the 
recognition groups on the various tests may possibly be explained with 
reference to the following differences in methods of study. Although 
the individuals in the completion group reported that they studied in 
the same manner as the individuals in the true-false and multiple- 
choice groups, 2.e., to get details, the general indication is that studying 
for a completion examination involves more effort than studying for 
either of the recognition types of examination. This, if true, might 
account in part for the superiority of the completion group. In the 
second place a smaller percentage of individuals in the completion 
group used no objective methods of study than in the true-false group. 
This difference, however, was not present between the completion and 
multiple-choice groups. Thirdly, a second difference was also 
indicated only between the completion and true-false groups and that 
was that a smaller percentage of individuals in the completion group 
took random notes. Fourthly, a larger percentage of individuals in 
the completion group made out questions than in the true-false group. 

That no one of these differences explains the results found is 
indicated by the fact that not only were there in some cases differences 
between only one of the recall groups and only one of the recognition 
groups as to the methods of study but the recall groups differed 
between themselves as to the methods of study used. As a matter 
of fact the essay group differed from the completion group as to 
methods of study in the same way as it differed from the recognition 
groups. The same differences were found with reference to: (1) The 
subjects’ reports as to their methods of study with the exception that 
the individuals in the completion group indicated that they studied 
the details more thoroughly than the individuals in the recognition 
groups did; (2) the percentage of individuals using either no or one 
objective study method; (3) the percentage of individuals using under- 
lining; (4) the percentage of individuals making maps; and (5) the 
percentage of individuals making summaries. The completion group 
in turn differed from the essay group in the same fashion as it differed 
from the recognition groups with respect to the percentage of indi- 
viduals taking random notes. Besides there is an indication that the 
completion group uses the making out of questions as a method more 
than the essay group does. These differences in methods of study 
between the essay group and completion group probably explain the 
differences in test results between these two groups. 








> eS eee eee oe eee ee 


ee ee re 


on. oe a Pais 


sa = 


4 
- 

4 
i) 
by 








40 The Journal of Educational Psychology 


The differences in methods of study between the two recognition 
groups probably explain also the existing differences in test results 
between those two groups. The percentage of individuals underlining 
or taking random notes was not so great in the multiple-choice group. 
On the other hand, the percentage of individuals making out questions 
was smaller in the true-false group. 


SUMMARY 


The examination set is of fundamental importance in determining 
the methods of study which the individual uses in learning. This is 
indicated by both the objective and subjective analysis of the methods 
of study used by the various examination set groups. Seemingly it 
is the differences in the number and combination of methods used 
which explain the differences in results among the various groups. 
Some of the differences in methods of study among the groups were the 
following. 

(a) The individuals in the essay group reported that they studied 
to obtain a general picture of the material whereas the individuals in 
the other three groups reported that they studied to learn the details 
only. 

(b) A smaller percentage of individuals in the essay group used 
no or just one of the objective methods of study than in any of the 
other three groups. 

(c) A smaller percentage of individuals in the essay group used 
underlining and a greater percentage made summaries and maps than 
in any of the other three groups. 

(d) The individuals in the completion group reported that they 
studied with more effort than they would have for a recognition test; 
while the individuals in the recognition groups reported that they 
studied with less effort than they would have for a completion test. 

e» | (e) A smaller percentage of individuals in the completion group 
used no objective method of study than in the true-false group. 

(f) A smaller percentage of individuals in the completion group 
took random notes than in the true-false group. 

(g) A greater percentage of individuals in the completion group 
made out questions than in the essay group. 

(h) A smaller percentage of individuals in the true-false group 
made out questions than in any of the other three groups. 





THE EFFECT OF IMPROVEMENT IN READING 
ABILITY ON INTELLIGENCE TEST SCORES 


J. W. HAWTHORNE 
Washington University, St. Louis, Mo. 


Intelligence tests from their inception have been subject to many 
attacks. A common criticism, implied if not overtly expressed, is 
that individual and especially group tests of intelligence are often 
invalidated by their dependence on reading ability. 

Thus Gates! states that ‘“‘tests which require reading—such as 
the National and most other ‘verbal’ tests—are not satisfactory with 
children who have reading defects.” Pininer? says that the “‘best 
known group tests at the present time depend largely if not entirely 
upon the knowledge and use of language.” Burt* remarked that 
“education and particularly linguistic attainments affect the result 
of the Binet-Simon scale more profoundly than almost any other 
factor.” 

In many clinical studies the question of the influence of reading 
ability on intelligence tests has been raised. Orton‘ cites the instance 
of a boy whose chronological age was sixteen years, two months, 
whose mental age was eleven years, four months, and whose IQ was 
-eventy-one, as derived from the Stanford-Binet test. When the test 
was repeated using alternatives to avoid reading, a MA of thirteen to 
ten and an IQ of eighty-six was obtained. Even the latter score was 
felt by the examiner not to be representative of his true intelligence 
score but to be affected by his reading disability. 

A similar case is mentioned by Root,® a boy whose CA was twelve 
years, whose MA was eleven years and whose reading age was nine 
years. Root states that he was confident that the boy would become 
the equal of his classmates in intelligence (which was one hundred ten 
IQ) when he had overcome his reading disability. 

Durrel, in an unpublished Master’s thesis, reported by Monroe® 
studied ‘‘The Effects of Special Reading Disability on Performance on 
the Stanford-Binet Tests’? and concluded that a correction of 4.75 
should be applied to the IQ in cases of reading retardation. 

Webb’ studied the relation between reading ability and intelligence 
as tested by group intelligence tests of the pencil and paper type. His 
subjects were students in his own psychology class, consisting of 
college sophomores, juniors, and seniors. These subjects were given 


the Army Alpha Intelligence Tests, two group tests of Thurstone, and 
41 














42 The Journal of Educational Psychology 


Monroe’s Standard Silent Reading Test. He found the following 
correlations: 


Total Alpha scores with Monroe rate....................... .49 
Total Alpha scored with Monroe comprehension.............. .68 
Thurstone A and B with Monroe rate....................... .64 
Thurstone A and B with Monroe comprehension............. .58 


He concluded from these results the ‘‘ Rapidity in reading is one 
of the large factors in determining the grade one makes in a pencil 
and paper intelligence test such as was employed in this study,” 
and also that ‘‘The Standardized Silent Reading Test employed in 
this study is a fairly good intelligence test.’”’ He further states that 
‘the premium put on rapidity of comprehension in our intelligence 
tests may be, and probably is, one of the causes of the low correlation 
so often reported to exist between intelligence tests and scholarship 
grades.” 

Gates® carried on a somewhat different and more pertinent study 
with seventy pupils in grades three to six of elementary school. For 
two years they were given intensive training in arithmetic, spelling, 
and reading. At the end of this time they were retested in intelligence 
with the Stanford-Binet tests. They showed changes in IQ varying 
from a twelve point decrease to an eighteen point increase, with an 
average change of six points. He found, however, no correlation 
between improvement in scholastic attainment and improvement in 
intelligence test scores. 

Many have noted and commented on the high correlation between 
reading tests and intelligence tests. A program of freshmen testing 
at Washington University in 1932 showed, for example, that the 
correlation between the Otis Self Administrating Tests and the Iowa 
Reading Tests to be .71 and in 1933 that the correlation between the 
Thurstone American Council Test and the Nelson Denney Reading 
Test was .61. Such statements as that made by Webb (above) that 
a reading test is a fairly good intelligence test were not, however, 
fully verified if scholastic attainment is considered as a criterion of 
validity. Thus the following correlations with college grades were 
obtained: 


_ Otis Self Administrating Test (’32)...............0..0 0000. .33 
I PO: ok, cc cacebesscedechcasdvestwek tus .43 
Nelson-Denney Reading Test (’33)......................... 44 


Thurstone Intelligence Test (’33)...................-...255. .58 








Reading Ability and Test Scores 43 


THE PRESENT STUDY-—PART I 


The mere fact that there is a high correlation between two variables 
is no reason for concluding that either variable is the cause of the 
other; however, such an assumption might seem justified on a priori 
grounds. The fact that there is a high correlation between reading 
test scores and intelligence test scores does not necessarily mean that 
the intelligence test score, even on a group language test, is dependent 
on reading ability. 

If one variable, e.g. intelligence test score, is not only related to 
another, e.g. reading test score, but is also dependent on it, then a 
change in the second variable should be accompanied by a change 
in the first. In other words, if there is a marked improvement in 
reading test score, there should be a concomitant change in intelligence 
test score. 

The present study is an attempt to determine the effect of improve- 
ment in reading test scores upon the scores made iu group language 
intelligence tests. For this problem it was deemed advisable to treat 
a group of pupils, preferably retarded in reading and in the average 
school environment, in the following manner: 

1. Obtain intelligence scores on the basis of a group language 
intelligence test. 

2. Obtain reading scores from a standardized reading test. 

3. Give specific remedial teaching in reading for a period of one 
semester. 

4. Obtain reading scores after the remedial instruction from another 
form of the same silent reading test as was given before the reading 
instruction. 

5. Give a retest in intelligence with the same, or another form of 
the same, intelligence test as was used before the reading instruction. 

If intelligence test scores were seriously affected by reading ability 
we should expect an improvement in intelligence scores when and if 
reading test scores are improved. 


SOURCE OF DATA 


Pupils from the fifth through the twelfth grades from four different 
schools in the vicinity of St. Louis were used in this study. 

Group I consisted of thirty-seven pupils in the ninth grade who 
were found to be retarded in reading ability. Initial IQ’s were 
determined from the tests given previously, mostly Terman Group 











44 The Journal of Educational Psychology 


but in some cases from National Group, Detroit Alpha, and Illinois 
General Examination. 

For this group special emphasis was placed upon reading for com- 
prehension; remedial drill in reading was carried on for one semester, 
from September, 1931, to January, 1932. At the end of this time 
Form III, of the Thorndike-McCall Reading Test was given, and 
Form A of the Terman Group Test of Mental Ability. 

Group II was made up of twenty-one pupils from ninth to twelfth 
grades who had repeatedly failed in academic work and in whose cases 
it was felt that a reading difficulty might be a factor in the failures. 
In September, 1931, they were given the Monroe Standard Silent 
Reading Test, Test III, Form 1, and the Terman Group Intelligence 
Test, Form A. They were divided into two classes which met every 
day for drill, and remedial teaching in reading was carried on until 
the end of the semester, January, 1932, when they were retested with 
Form II of the Monroe Reading Test and Form A of the Terman 
Intelligence Test. 

Group III represents fifty-six pupils in the fifth, sixth, and seventh 
grades. These pupils were given the Terman intelligence test and 
the Gates Silent Reading Tests in September, 1929. Following this 
testing all the pupils were given specific remedial work in reading, 
and at the end of the semester in January were retested with the Ter- 
man Intelligence Test. 

Group IV consisted of twenty-one Junior high school pupils from 
the sixth to ninth grades inclusive who were noticeably retarded in 
reading. They were given the Ingraham Clark Reading Test and 
the Terman Group Intelligence Test in January, 1931, and after a 
five month period of intensive work in remedial reading were retested 
with another form of the reading test and the same form of the intelli- 
gence test. 

Table I summarizes the nature of the tests and the subjects used 
in this study. 


DESCRIPTION OF TESTS USED IN THE STUDY 


The Terman group intelligence test and three reading tests, the 
Thorndike-McCall Reading Scale, the Monroe Silent Reading Test, 
and the Gates Silent Reading Tests were used in this study. 

The Thorndike-McCall Reading Scale is a silent reading test com- 
posed of paragraphs, which are to be read, and questions which are to 
be answered to the end of each paragraph. It measures only compre- 





~— — ey OH TD Oe 








Reading Ability and Test Scores 45 


hension. It separates the good and the poor readers, but does not 
give any analytical diagnosis of the specific difficulties of the indi- 
vidual. Current and Ruch® found the reliability of this test to be .75. 
Mosher” in a study to check the result of Current and Ruch, found a 
reliability of .72. The various forms of the test seem to be of about 
the same difficulty and to give the same scores distributed approxi- 
mately within the same range. 





TaBLE I.—DESCRIPTION OF THE DATA 

















Num-| Range | Range| Range Range of reading Reading Intelligence 
Group | ber of of of of retardation or tests tests 
pupils| grades| CA IQ advancement used used 
I 27 9th 13-16 | 72-118 | 3.9 years below school | Thorndike- Terman Group 
years | Mean = | grade to 1.5 years | McCall Test A 
95.91 above school grade | Reading 
Test: I, III 
II 21 oe. ee |. Cn Deckessdusheohseeeee Monroe Terman Group 
12th years | Mean = Reading Test A 
AZ, 94.90 Test, Test 
S*7 III, Form I 
and II 
III 56 5th- | 10-14 | 71-145 |2.88 years below| Gates Silent | Terman Group 
é 7th | years | Mean =| school grade to 1.5| Reading Tests) Test A 
100.8 years above school 
grade 
IV 21 6th— | 11-15 | 83-120 | 3.8 years below school | Ingraham- Terman Group 
7th— | years | Mean =| grade to 1.5 above/ Clark Test A 
8th 100.1 school grade 
| 


























1 Group II could not be used in all of the analyses since the Monroe Reading Test norms are not 
comparable to those of the other tests. 


The Monroe Standardized Silent Reading Test scores both rate 


and comprehension. 


It is composed of twelve paragraphs to be 


read, and certain questions to be answered at the end of each of the 
paragraphs. Only five minutes are allowed. This test does not give 
any analytical diagnosis of the particular difficulties of the individual. 
It cannot be considered an accurate score of the rate since the time 
used for answering the questions is included in the rate score. Current 


and Ruch!! found the reliability for this test to be .76. Mosher” 
found the reliability of .77. 

The Gates Silent Reading Tests include four types, each testing a 
different technique of reading. They are: Type A, Reading to 











46 The Journal of Educational Psychology 


Appreciate the General Significance of a Paragraph; Type B, Reading 
to Predict the Outcome of Given Events; Type C, Reading to Under- 
stand Precise Directions; Type D, Reading to Note Details. These 
tests have the advantage of being more discriminating and of diagnos- 
ing to some extent the particular difficulty. From the raw score the 
reading age and grade may be determined. The reading age and the 
reading grade represent the age and grade of the average pupil of 
average mentality answering correctly a given number of paragraphs. 

The Ingraham-Clark Diagnostic Reading Test consists of two parts, 
the first testing word forms and meanings and the second part testing 
sentence and paragraph meanings. It has the advantage of being 
diagnostic of reading difficulty and in addition is well standardized on 
over twenty thousand pupils. Reliabilities, calculated on single 
grades, range from .82 to .95. Its validity was determined by com- 
parison with a battery of other reading tests, consisting of the New 
Stanford, Gates, Los Angeles Elementary, Sangrew-Woody, and 
Detroit. 

The Terman Group Test of Mental Ability consists of ten tests: 
Information, Best Answer, Word Meaning, Logical Solution, Arith- 
metic, Sentence Meaning, Analogies, Mixed Sentences, Classification, 
and Number Series. Norms were determined by a comparison of the 
scores of three hundred six pupils on the Terman Test with their scores 
on the Stanford-Binet Tests. The reliability as found by Wyman and 
Wendel!* is .85 and .03. 


TREATMENT OF THE DATA 


The difference between the mean reading grade before the remedial 
instruction and the mean reading grade after the remedial instruction 
and the difference between the mean IQ before the remedial teaching 
and the mean IQ after remedial instruction, were determined. These 
differences are shown in Table II. The mean reading grade of Group I 
before remedial work was 7.25, which represents a retardation of 
1.75 years below the present school grade. After remedial work, 
the mean reading grade of this group was 9.07, or about .43 years 
below present school grade. The improvement then for this semester 
of special remedial work was 1.82 years, (PE = .28) as compared 
with .42 years improvement which one would expect under normal 
training. This shows an appreciable improvement in reading as a 
result of this specialized teaching. The mean IQ before remedial 
teaching was 95.91 and after remedial teaching was 98.22, which 








Reading Ability and Test Scores 47 


represents a difference of 2.31 in IQ with a probable error of 1.73. 


This is hardly a significant difference since it is only 1.3 times the 
PE of the difference. 


Tasie II.—Comparison oF Pupits Berore AND AFTER 
Remedial Instruction in Reading 














el Mean IQ; PE | MeanIQ!} PE Difference | PE of 
Grades he before of after of | between IQ| differ- 
RI mean RI mean | IandIQII/ ence 
es Ak nb < own ete on 9 27 95.91 | 1.33 98.22] 1.11 + 2.31 1.73 
Group II... rs 21 94.90 .99 93.43 | 1.23 — 1.47 1.58 
11, 12 
Group III mee fe 56 100.8 1.24 101.32 | 1.05 + .52 1.63 
Group IV.. ieateceeeanen ie 100.1 ee 100.1 sees .10 
eatin, Mean RG Mean RG Difference 
Grades ~ before PE after PE | between RI} PE 
| ’ RI RI Iand RI II 
Group I.... 9 27 7.25 .16 9.07 .22 + 1.82 .28 
Group II. . ...| 9,10 
11, 12 21 46.88 | 1.51 70.88 | 2.56 +24.00 2.99 
raw score raw score 
Group III. . ...| 6 6,7 56 5.20 .13 6.02 13 + .82 .02 
Group IV... 7, 8,9 21 6.20 .14 7.90 .16 1.70 .21 





























RI = Remedial Instruction. 

RG I = Reading Grade before RI. 

RG II = Reading Grade after RI. 

IQ I = Intelligence Quotient before RI. 
IQ II = Intelligence Quotient after RI. 


In Group II, because the Monroe Reading Test was used, the exact 
reading grade could not be obtained and it was necessary to use the 
raw scores for comparison. The average of the comprehension and 
the rate score was taken, and the mean found to be 46.88 on the first 
test, given before remedial work and 70.88 on the second test given 
after remedial work. These scores show an improvement of 24.00 
(PE = 2.99) points, a gain of over fifty per cent in test scores from 
September to January. In the intelligence tests the mean IQ before 
remedial reading work was 94.90. On the retest after remedial 
reading instruction the mean IQ was 93.43, an insignificant difference 
of 1.47 points (PE = 1.58), and in quite obvious contrast with the 
large gain in reading ability. 








48 The Journal of Educational Psychology 


In Group III the mean reading grade was 5.20, a retardation of 
one year below the current school grade. On the second reading 
test, after remedial instruction, the mean reading grade was 6.02, 
which represented an improvement of .82, (PE = .02) during a period 
in which one would expect an improvement of .42. In other words 
these pupils improved in reading about twice as much under special 
remedial teaching as one would expect them to improve under normal 
teaching. The mean IQ before remedial teaching was 100.80. The 
mean IQ on the retest after remedial instruction was 101.32, represent- 
ing no difference at all. 

Group IV showed a mean reading grade of 6.2 and a mean IQ 
of 100.1 before remedial instruction. After 5 months of remedial 
instruction in reading the mean reading grade was 7.9 and the mean 
IQ was found to be 100.1. Here we have a significant improvement 
in reading score, over a five months period, of 1.3 (PE = .03) with no 
corresponding improvement in intelligence test scores. 

In all four groups, representing over a hundred pupils in various 
grades and schools, an average improvement in reading ability was 
made which was at least twice what would be expected during the 
semester. However, in no case was the change in IQ significant. 

These results are consistent with the many experiments in retesting 
the IQ without an intervening period of remedial reading instruction. 
These experiments have shown the IQ to be approximately constant 
and to vary only a few points between tests. Dearborn sums up 
these results in the statement ‘“‘ Repeated tests of the same individuals 
at yearly or more frequent intervals have shown that this ratio of 
mental age to chronological age is fairly constant year after year.’’' 
Since reading ability is undoubtedly improving during the intervals 
between test and retest, one might seriously question statements that 
the group intelligence test measures reading to a large degree or that 
the scores are seriously affected by reading ability or disability. 

In spite of the failure of the average intelligence test score to 
improve along with an improvement in reading test score, it might 
still be possible that individual variations within the group would 
be such as to obscure an intrinsic interdependence of the two variables 
if such an interdependence really happened to be present. If this 
were the case, we should expect that individual changes in intelligence 
test scores would be positively correlated with changes in reading 


test scores. 











Reading Ability and Test Scores 49 


To compare the individual improvement in reading and in intelli- 
gence scores, the change in the IQ and the change in reading were 
correlated for Group I and Group III (see Table III). The correlation 
found was —.17 (PE = .07), which, as in the case of the mean differ- 
ence, is negligible, and points to the conclusion that in pupils above 
the fifth grade, as much as three years retardation in reading, as 
measured by a silent reading test, does not have a depreciating effect 
upon the IQ, as measured by a group language intelligence test such as 
the Terman Test. 

The initial retardation in reading was correlated with the IQ 
(see Table III). If reading affects the IQ we should expect to find a 
higher correlation between these two factors. Only a low correlation 
of .27 (PE = .07), however, was found. Although this, as a positive 
correlation, is not high enough to seriously contradict the insignificant 
correlation between improvement in reading and the increase in IQ. 

Supplementary, but related to the above problems, a correlation 
was made between the years advanced or retarded in reading, as 
measured by the initial test, and the years gained or lost as shown in 
the retest after remedial work. Miles found that those who read 
most accurately at the beginning usually showed the least progress, 
but that ‘“‘various initial abilities apparently have little influence 
on how much gain the class will make, or indeed whether it will gain 
or lose.’’'® In an experiment in learning the type of materials found 
on intelligence tests, and fifteen forms of Part I of the Thorndike 
Intelligence Examination for high school graduates, Race found that 
“‘the measure of initial ability is a measure and a prophecy of improve- 
ment; that native capacity in any one function determines what shall 
be accomplished by that function.” The results of the present study 
are more consistent with those of Miles than with those of Race, since 
a negative correlation of —.17 and .07 were found between these 
two factors (see Table ITI). 

















Tas.e III 
Correlations N 
1. Years gain or loss in reading on retest with change inIQ.| —.17| .07/| 98 
2. Initial retardation in reading with the IQ.............. .27| .07 | 98 
3. Years gain or loss after remedial teaching with years 
retarded or advanced before... ....... 0. cc csccccccecs —.17| .07| 98 
4. Improvement in reading with IQ..................... .07| .07 | 98 











M 
nf 
AY 

' 


Lar ee te * 


en el — 


= = 


a ear ae a Saw se ee 








< : BOs eae 
2 a ee Se Ss Re ee ae 





50 The Journal of Educational Psychology 


The original IQ and the amount of gain or loss in reading after 
remedial teaching were correlated, and again insignificant results 
were found, that is .07 and .07. One explanation of this lack of 
relationship may be that most of the IQ’s were in the average group, 
and very few, except those of the oldest pupils, indicate “dull” 
intelligence. Therefore, intelligence could not be considered a factor 
in the reading retardation. Brooks found results consistent with the 
present study, namely that ‘“‘correlation between mental ability and 
one or two years improvement in silent reading are disappointingly 
low, ranging from small negative values to positive correlations as 
high as .30 for pupils of the same grade.’”’” He felt, however, that if 
material suited to individual needs were used, the correlations would 
be higher. Miles'* found a negative correlation, of —.218, between 
IQ and gain in reading ability. Race concludes from her studies 
of improvability that ‘‘In the relation of improvability to general 


ability, the more superior the general intelligence, the more significant 
the learning process.” 


CONCLUSIONS 


This study of one hundred four pupils indicates: Pupils from the 
fifth to the twelfth grade, of average intelligence, and showing as 
much as three years retardation in reading, who have improved during 
a period of remedial teaching twice as much as would be expected 
under regular teaching, show: 

1. No corresponding improvement in intelligence such as one would 
expect if the intelligence tests were dependent to a large extent upon 
reading ability. 

2. A low correlation between initial retardation and the IQ. 

3. An insignificant relationship between the IQ and the amount of 
improvement made in reading during this period of remedial instruc- 
tion. : 

4.. An insignificant relationship between the degree of initial 
retardation and the degree of improvement in reading. 


BIBLIOGRAPHY 


1,8. Gates, Arthur I.: The I mprovement of Reading. New York, 1929. 
2. Pintner, Rudolph and Patterson, Donald C.: ‘A Non-language Group 
Intelligence Test.” Journal of Applied Psychology, Vol. III, 1919. 
3. Burt, Cyril: Mental and Scholastic Tests. London, 1921. 
4. Orton, Samuel T.: ‘‘‘Word Blindness’ in School Children.” Archives of 
Neurology and Psychiatry, Vol. XIV, 1925. 





9, 11. 
10, 12. 
13. 


14. 
15, 18. 


16, 19. 


17. 


Reading Ability and Test Scores 51 


Root, W. T.: ‘The Intelligence Quotient from Two View Points.” Journal 
of Applied Psychology, Vol. VI, 1922. 

Monroe, Marion: Diagnostic and Reading Examination (Manual of Direc- 
tions). Institute for Juvenile Research, Chicago, Vol. III. 


. Webb, L. W.: “‘ Ability in Mental Tests in Relation to Reading Ability.” 


School and Society, Vol. XI, 1920. 

Current, W. F. and Ruch, G. M.: ‘‘ Further Studies on the Reliability of 

Reading Tests.” Journal of Educational Psychology, Vol. XVII, 1926. 

Mosher, Raymond M.: ‘Further Note on the Reliability of Reading 
Tests.” Journal of Educational Psychology, Vol. XIX, 1928. 

Wyman, J. B. and Wendle, Miriam: ‘‘ What is Reading Ability?” Journal 
of Educational Psychology, Vol. XII, 1921. 

Dearborn, Walter F.: Intelligence Tests. New York, 1928. 

Miles, D. H.: ‘Can the High School Pupil Improve his Reading Ability?” 
Journal of Educational Research, Vol. XIV, 1926. 

Race, H. V.: ‘‘Improvability, Its Intercorrelations and Relation to Initial 
Ability.” Teacher's College Contribution, No. 124, 1927. 

Brooks, Fowler D.: The Applied Psychology of Reading. New York, 1926. 











9 FR OER PERS We BAe 


47 ESS 


RT I a SS EE Sen Fee aia a PE I RS BPE 





Fe a a ik a eS 
Fae pee ely gaa mt 


eneetoete 


| 
i 





cee 


THE RELATION OF CHRONOLOGICAL AGE TO 
ACHIEVEMENT IN THE STUDY OF FRENCH 


F. H. FINCH AND OLIVER R. FLOYD 


University of Minnesota 


The extent to which achievement in the study of language is 
related to age has been made the subject of little careful investigation. 
The nature of some of the more important studies of this relationship, 
together with the conclusions reached, will be discussed briefly before 
any new data are presented. 

Cheydleur* compared the achievement in French of thirty-nine 
adults ranging in age from eighteen to sixty-two years with that of 
fifty-four college freshmen. He used as a measure of achievement the 
American Council Alpha French Tests. When the adults had devoted 
forty-two class hours and the college students sixty-four class hours 
to instruction in French, the test revealed approximately the same 
average score in the two groups. The scores of the adults were more 
variable than were those of the younger students. When twice the 
above time had been devoted to the study of French, thirty-two adults 
and forty-six college students were again tested. The scores at this 
time indicated some superiority of the adult group. When the sections 
of the tests are considered separately, the adults appear superior in 
vocabulary and silent reading, but inferior in grammar. It is not 
easy to evaluate the effects of differences in time available or, what is 
more important, differences in motivation, interest, and ability arising 
from selective factors determining the composition of the two groups. 

Buswell® photographed the eye movements of subjects of various 
ages while reading French. Unfortunately data were collected on very 
small groups of students, and while great care was apparently exercised 
in obtaining records on the subjects being studied, it is difficult to 
determine how nearly these small samples are typical of the ages and 
grades which they represent. Furthermore, there may be some ques- 
tion as to the validity of photographic records of eye movements as 
measures of achievement in the reading of alanguage.! While Buswell 
concludes that the elementary school groups are distinctly inferior to 
the others, on certain selections his college freshmen make a poorer 
showing than either elementary or high school students, and on two 





1 Eurich® found little relationship between scores on a number of reading tests 
and photographic records of eye movements on college students. 
52 





Relation of Age to Achievement in French 53 


of five selections read the fourth grade children show fewer regressive 
movements than any older group of students (p. 54). 

Thorndike,” in his study on Adult Learning, reports that the 
composite gains in eight high school subjects, including French, 
German, Latin and Spanish, increase for age groups between the 
years fourteen and twenty. He also reports that among individuals 
studying Esperanto under carefully controlled conditions, age appears 
to be an advantage up to sixteen, and possibly to twenty. 

Henmon* discusses unpublished studies by Li’ and by De Sauzé. 
Li tested in the junior and senior high school grades children who had 
studied French for the same length of time. ‘There is in most of his 
groups some superiority for those beginning French in the senior high 
school, but the selective character of the senior high school is such as 
would probably result in marked differences in ability in the groups 
being compared. When the same data are grouped according to age 
instead of grade, the differences for the fourteen, fifteen, sixteen, and 
seventeen year groups are termed ‘‘negligible’”’ by Henmon. In the 
case of German, Li finds no indication of superiority among those 
beginning in the tenth or eleventh grades as compared with those 
beginning in the ninth. 

De Sauzé tested at the end of one semester of French study seven 
hundred fifty students scattered through grades seven to eleven. His 
results show the seventh grade to be inferior on most counts, and give 
some suggestion that the ninth may equal or surpass the tenth and 
eleventh. Median intelligence by grades is reported, but since the 
test used in the seventh grade was different from that used with the 
more advanced students, his comparison of the groups in ability may 
not be valid. 

Moore,? in her study of Pennsylvania high school graduates, under 
sixteen years of age, found that fifty children who had studied French 
for four semesters, and twenty who had studied French for six semes- 
ters, surpassed the state-wide average of high school graduates who 
had devoted equal amounts of time to the subject. Since the young 
students were on the whole superior in general intelligence, the effect 
of age per se can hardly be evaluated. 

The present study reports the results of the administration of the 
American Council French Tests':? to pupils in the University High 
School, University of Minnesota, after one, two and three years of the 
study of the language. The first group consists of one hundred 
forty-seven pupils ranging in chronological age from eleven to nineteen 








DoS al 


Re SE RE ES eee ee ge ET 


OL ee ee 


Sa REIS DSO TN es Se Ruse vatheeaee 


Ate? - ES. 


fr RE 


Pot. ees 
Ba east 


> 


42S ee ay 
Rad aie eae STS . ce ae - 


Sees 








54 The Journal of Educational Psychology 


years when tested at the close of one year’s instruction in French. 
All of the pupils involved in this study had been tested with a series 
of five group intelligence tests. The tests used were Army Alpha 
eight, Haggerty Delta two, Miller, Form A, Pressey Senior Classifica- 
tion Test, and Terman, Form A. Scores on each test other than 
Terman were equated in terms of Terman according to the method 
described by Miller.2 The median IQ of each pupil, based upon the 
five equated values, was used as the measure of intelligence in all the 
correlations calculated for this study. 


TaBLE I.—CoRRELATION BETWEEN AGE, ACHIEVEMENT, AND INTELLIGENCE 
First-year French Students 





N = 147 
VOCABULARY GRAMMAR 

rig = .101 + .054 fia = —.117 + .055 

33 = .321 + .050 Te: = .475 + .043 

Tis; = — .405 + .046 Ti2.3 = .093 

fia.s = .273 Rais) = .482 

Rais) = .408 
1 = Age 2 = Achievement 3 = Intelligence 
M SD Range 


— 





14.57 | 1.59 | 11-19 
119.05 | 12.16 | 95-168 
27.25 | 7.02 6-43 
16.20 | 6.92 3-37 


iS 56s ve we resnebwdaveendeeees 
NN hin k's o Mik dan dig Keenan wanes 

a tns bo win dda. a Na aaah Biveea.s « 
es a oo din-w 6 dew ee 04 ea eke 














Table I portrays the zero order, partial and multiple coefficients 
of correlation between age, intelligence, and achievement on the 
indicated sections of the American Council Alpha French Test. 
Reference to Table I will indicate a correlation of .10 between age 
and achievement for vocabulary and one of .12 for grammar. When 
intelligence was partialled out this relationship was raised to .27 for 
vocabulary and reduced to .09 for grammar. Another indication of 
the significance of age in language study is revealed in the table. The 
zero order correlation between intelligence and achievement is .32 
for vocabulary and .48 for grammar. When the effect of age was 
also introduced and the multiple correlation computed, the correlations 
became .41 for vocabulary and .48 for grammar. It would therefore 
appear that in this case age adds little to the accuracy of prediction 


based upon intelligence alone. 











ww = = 





Relation of Age to Achievement in French 55 


Corresponding data obtained from other groups tested at the 
completion of courses in French of varying lengths are reported in 
Tables II to V. Unless otherwise indicated, the subject-matter test 
referred to is the American Council Alpha French Test. In two of the 
tables the computations are based upon results obtained with the 
American Council French Grammar Test. In each case the figures 
for chronological age refer to the age at which the study of French 
was begun. 


Taste II.—CorRELATION BETWEEN AGE, ACHIEVEMENT AND INTELLIGENCE 
First-year French Students! 
N = 85 
Sitent READING 
ris = .085 + .073 
res = .555 + .051 
ris = —.321 + .066 








Ti2.3 = .337 
Rais) = .620 
1 = Age 2 = Achievement 3 = Intelligence 
M SD Range 
sa, i bv cde wk Wek wh.anlelda dl 14.58 | 1.65 | 11-19 
i a oe 119.50 | 11.87 | 95-164 
ee ooo ee ba Gee a mi 9.92 | 4.50 1-22 














1 All the cases included in this table are among the group found in Table I. 


TasBLeE III].—CorRELATION BETWEEN AGE, ACHIEVEMENT AND INTELLIGENCE 
Second-year French Students 








N = 69 
VOCABULARY GRAMMAR 
fig = .191 + .077 rio = .109 + .079 
Te3 = .463 + .063 T33 = 444 + .065 
Tis = — .320 + .072 Ti2.3 = . 295 
Tiz.3 = .40 Rais) = .517 
Rais) = .585 
1 = Age 2 = Achievement 3 = Intelligence 
M SD Range 
i. cade dveeedveeeceseceune 14.75 | 1.43 | 12-19 
I, os cod owned 6 nbeoe bbws 119.16 | 12.60 | 93-164 
sian bsalgt 66 4s oN awwd 606 KOS 35.9 8.80 | 19-53 
TY ses bed oe wena bt wadeoseoudes 28.94 | 10.25 7-45 























56 The Journal of Educational Psychology 


TaBLE I1V.—CorRRELATION BETWEEN AGE, ACHIEVEMENT, AND INTELLIGENCE 
Second-year French Students 






































N = 67 
AMERICAN Councit FreNcH GRAMMAR TEST 
Tie = —.333 + .073 
T23 = .308 + .074 
Ti3 = — .468 + .063 
Ti..3 => — .219 
Rais) = .375 
1 = Age 2 = Achievement 3 = Intelligence 
M SD Range 
CL oi at dne he geaee sé bb ubaede oeeee 14.12 | 1.14 | 13-18 
ecto ce ee nee ee enewab ere ae 121.08 | 12.35 | 94-168 
i ioe eee eens deueeas 58.92 | 11.90 | 32-84 
rt 
i 
a, 
F TasBLeE V.—CoORRELATION BETWEEN AGE, ACHIEVEMENT, AND INTELLIGENCE 
Third-year French Students 
N = 79 
American Counctt Frencne GrRaMMAR TEST 
7 ris = .065 + .075 
é re3 = .20 + .073 
i ris = —.425 + .062 
ie Ti3.3 = .171 
4 Rais) = .261 
4 1 = Age 2 = Achievement 3 = Intelligence 
: M SD Range 
¢ III. oo ao osc sccccscsccsseceves 13.69 .926 |14.0-18.6 
: PED vo ccccscctcccccsvcsseveces 122.12 11.72 | 100-168 
: Tees ake ne td dmesg ee ean <a dubre 67.12 12.11 36-90 
ts For the groups of pupils studied, chronological age appears to be a 


relatively unimportant factor in determining success in the study of 
French. It should be pointed out that the individuals involved are, 
on the average, superior high school students with respect to general 
é intelligence. The mean intelligence quotients of the five groups 
4 ranged from one hundred nineteen to one hundred twenty-two. 
Whether similar results would obtain for pupils of more nearly average 
: ability merits further investigation. 


: 
i 
’ 
r 
pA 
ee 
¢ 


:) 








Relation of Age to Achievement in French 57 


There is nothing evident in the data here assembled which would 
suggest that the study of French be reserved to the senior high School 
or that seventh and eighth grade pupils who commonly fall in chrono- 
logical age within the lower limits included in this study should be 
denied the opportunity to begin the study of French on grounds of 
immaturity. 

The content of the course in French presented in the eighth grade 
of the University of Minnesota High School in which the younger pupils 
in this study had been enrolled is roughly equivalent to the work in 
beginning French usually presented in senior or four-year high schools. 
Obviously achievement on this level comparable to that of older 
students cannot be expected in schools where the courses consist 
mainly of conversational French or where because of limited time 
devoted to this subject in the weekly program, opportunity to cover 
the material commonly included in the first year does not exist. 

A final answer as to the school grade at which the study of French 
should be begun cannot be given until similar studies have been made 
within all the subjects in the curriculum. When there is available 
information relative to the réle of chronological age in achievement 
in each of the subject-matter fields, it should be possible to select the 
optimum grade for the introduction of each. Lacking this information 
and facing the practical situation of deciding whether to admit the 
study of French to the lower years of the junior high school, the 
decision, in the light of the material developed in this and previous 
studies, should be based upon factors other than chronological age. 


BIBLIOGRAPHY 


1. American Council Alpha French Test prepared by V. A. C. Henmon, Algernon 
Coleman, and Marion R. Trabue, World Book Co., 1927. 

2. American Council French Grammar Test (Selection Type) prepared by F. D. 
Cheydleur, World Book Co., 1927. 

3. Buswell, G. T.: A. Laboratory Study of The Reading of Modern Foreign Lan- 
guages. Publications of the American and Canadian Committees on 
Modern Languages, MacMillan, Vol. II, 1927. 

4. Cheydleur, F. D.: ‘‘An Experiment in Adult Learning of French at the Madi- 
son, Wisconsin Vocational School.” Journal Educational Research, Vol. 
XXVI, 1932, pp. 259-275. 

5. Eurich, A. C.: ‘The reliability and validity of photographic eye movement 
records.” Journal Educational Psychology, Vol. XXIV, 1933, pp. 118-122. 

6. Henmon, V. A. C.: Achievement Tests in the Modern Foreign Languages. 
Publications of the American and Canadian Committees on Modern Lan- 
guages, MacMillan, Vol. V, 1929. 











58 The Journal of Educational Psychology 


7. Li, Chen-nan: Factors Conditioning Achievement in the Modern Foreign Lan- 
guages. Unpublished Doctor’s thesis, Yale, 1927. 

8. Miller, W. S.: ‘‘The Variation and Significance of Intelligence Quotients 
Obtained from Group Tests.” Journal Educational Psychology, Vol. XV, 
1924, pp. 359-366. 

9. Moore, Margaret Whiteside: A Study of Young High School Graduates. Teach- 
ers’ College Contributions to Education No. 583, 1933, pp. X + 78. 

10. Thorndike, E. L.: Adult Learning. Macmillan, 1928, pp. X + 335. 


a aah oe © 


BR. ins Sas 


oF hanger sel abetis 


4 
A 
; 

sb 
wh, 
it 








SUPERSTITION AND PERSONALITY 


JAMES PAGE 
Columbia University 


Insecure and emotionally maladjusted individuals frequently 
find in superstitions security and mental peace. By knocking on 
wood they avoid misfortune. By carrying a good luck charm, they 
insure success. If they should fail, they exonerate themselves and 
rationalize their defeat by recourse to superstitions. They were 
doomed to fail because a black cat crossed their path; they started 
the task on the 13th or on a Friday. 

Educational and cultural factors undoubtedly affect belief in 
superstition. College students are less superstitious than high school 
students.! The well informed believe fewer superstitions than the 
uninformed.” Rural students possess and are influenced by a 
greater number of superstitions than urban students.*® Children 
coming from better homes tend to be less superstitious than children 
from poorer homes. Negro children are more superstitious than 
white.® | 

However, at every educational cultural level we find certain 
individuals who are more superstitious than other members of their 
group. Numerous studies indicate that these individuals are fully 
as intelligent as the less superstitious.°*’* They suffer from a 
personality rather than an intellectual impairment. Experimental 
confirmation of this view has already been reported by Maller and 
Lundeen.’ They found that the more superstitious children also had 
more fears and worries. They were more maladjusted emotionally 
than their less superstitious classmates. 

In this study we have attempted to more clearly establish this 
relationship between belief in superstitions and emotional maladjust- 
ment by comparing the number and kind of superstitions reported 
by normal adult individuals with the number and kind of supersti- 
tions reported by psychotic individuals. In addition, we have 


investigated the relationship between introversion and belief in 
superstitions. 


EXPERIMENTAL PROCEDURE AND RESULTS 


A list of twenty-five popular superstitions (Appendix 1) together 
with the Heidbreder and Neymann-Kohlstedt tests of introversion 
was given to fifty manic-depressive patients, fifty dementia praecox 

59 








60 The Journal of Educational Psychology 


patients and fifty hospital attendants. All of the subjects were either 
patients or employees of the Kings Park State Hospital. Since 
the hospital attendants came from the same cultural strata as the 
patients, we may safely assume that the three groups were quite 
comparable as regards education and social background. 

As noted in the directions given in Appendix 1, our subjects were 
requested to report as true all the superstitions they believed and 
to “question mark” all the superstitions they were doubtful about. 
We assumed that all superstitions ‘‘question marked’ were half- 
believed. The psychotic patients were tested individually by the 
examiner and the normal subjects were tested in groups. Our data 
have been analysed with the following objectives in mind. 


1. A comparison of the three groups as to the average number of superstitions 
believed and half-believed. 

2. A comparison of the three groups as to specific superstitions believed. 

3. The relation of introversion-extroversion to belief in superstitions. 


Our results concerning the first problem are summarized in Table I. 
Both as regards the number of superstitions believed and the number 
half-believed (?), our findings are practically the same. In both 

/ cases, the psychotic groups report about an equal number of supersti- 


‘tions and the normal group report about half as many as either 


_ psychotic group. 


TABLE I.—Grovup INTER-COMPARISONS AS TO NUMBER OF SUPERSTITIONS BELIEVED 


























Average Average 
Nive, | hell Total! 
believed 
ed cee bie weak eck ewan eee 50 2.50 .82 2.91 
eS 50 4.58 3.14 | 6.15 
Manic-depressive............... feea ean 50 5.34 3.04 6.86 
1 Half-believed items counted as one-half. 
The reliability of our findings is indicated in Table II. We note 


that the differences between the normal group and the two abnormal 
groups are completely reliable. The differences between the two 


psychotic groups on the other hand are negligible.® 

The marked similarity between the two psychotic groups was 
further confirmed by comparing the three groups as to specific super- 
stitions believed. This was done by finding the percentage of each 








Superstition and Personality 61 


group believing each superstition to be true (Appendix 1) and deter- 
mining the reliability of the percentages found. 


Tasie II.—ReEwiaBILiry or DirreRENcES REPORTED IN TABLE I 




















: . ‘ Chances in 
Difference| Sigmapisr. | D/Sigmapits. sen tniniiaed 

Normal and D.P. 

i a ad ae 2.08 .67 3.10 100 

Half-believed........... 2.32 .53 4.38 100 
Normal and M.D. 

ve eks dad «ode 2.84 .66 4.30 100 

Half-believed........... 2.22 .58 3.83 100 
D.P. and M.D. 

i ga ala aa .76 .82 .93 82 

Half-believed.......... : .10 .63 .16 56 





The marked similarity between the two psychotic groups was 
indicated in two ways. In the first place, neither psychotic group 
reported a single superstition as true, reliably more frequently than 
the other group. Secondly, the following six superstitions were 
found to be more typical of both manic-depressive and dementia 
praecox patients as compared with normal individuals. By this 
we mean that the chances are at least ninety-nine in one hundred that 
belief in the following superstitions is more characteristic of manic- 
depressive and dementia praecox patients than of normal individuals. 


SUPERSTITIONS MORE CHARACTERISTIC OF PSYCHOTIC INDIVIDUALS 


The number 7 is the perfect number. 

A person who avoids another person’s gaze is dishonest. 

An expectant mother by fixing her mind on a subject can influence the character 
of her unborn child. 

One can estimate very accurately an individual’s intelligence by looking at 
his face. 

The shape and the prominences of the head indicate one’s character. 

God sometimes talks to certain individuals. 


In addition, believing in the superstition ‘‘a square jaw is a sign 
of will power’’ was more characteristic of the manic-depressive patients 
than of normal individuals. The normal group failed to report a 
single superstition reliably more frequently than either of the psychotic 


groups. 





62 The Journal of Educational Psychology 


—_— 


The relation between introversion and belief in superstition was 
determined by obtaining correlation coefficients between number of 
superstitions believed to be true and introversion scores on both the 
Neymann-Kohlstedt and Heidbreder tests. Considering introvert 
scores as plus on both tests, the correlation coefficients for the one 


hundred fifty cases were: ; 
» 

Heidbreder—superstitions................00ee000- +0.30 + .05 

Neymann-Kohlstedt—superstitions................ +0.10 + .05 


The positive relationship existing between introversion and belief 
in superstitions was further brought out by comparing the ten most 


introverted individuals in each group with the ten most extroverted 
(Table ITI). 


TaBLE IJI.—RELATION OF BELIEF IN SUPERSTITIONS TO INTROVERSION 





Average number of super- 
stitions believed 

















D.P. M.D. Normal 
Heidbreder test. 
Ten most introverted.................... | 8.65 8.00 3.75 
Ten most extroverted....................; 3.65 5.65 1.70 
Neymann-Kohlstedt. | 
Ten most introverted.................... ——s« 8.05 7.40 2.45 
Ten most extroverted.................0.. | 4.70 6.05 3.40 





On both tests, the more introverted psychotic patients were also 
the more superstitious. This relationship held also for the normal 
group on the Heidbreder test. The ten most introverted normal 
individuals on the Neymann-Kohlstedt test, however, were not as 
superstitious as the ten most extrovertea individuals. This dis- 
crepancy may account for the comparatively lower correlation coeffi- 
cient found between the Neymann-Kohlstedt test and superstition 
score. 


SUMMARY 


By means of psychological tests of a questionnaire type we have 
compared normal and psychotic individuals as regards belief in 
superstitions. In addition, we have investigated the relationship of 
belief in superstitions to introversion. Our results indicate: 











Superstition and Personality 63 


1. Normal individuals are less superstitious than psychotic 
individuals. Manic-depressive and dementia praecox patients believe 
twice as many superstitions as normal individuals. 

2. Manic-depressive patients resemble dementia praecox patients 
both as to number and kind of superstitions believed. 

3. There is a positive correlation between introversion and belief 
in superstitions. The more introverted patients tend to be more 
superstitious. This relationship does not hold for normal individuals. 


APPENDIX 1 


The twenty-five superstitions studied are listed below. In order 
to economize on space we have omitted the “‘ True,” “‘ False” and ‘“‘?” 
which preceded each question. The figures given at the left repre- 
sent the percentages of each group believing each statement to be 
true. M.D. indicated the manic-depressive group, D.P. the dementia 
praecox group and N the normal group. 

Directions: Read the first statement. If you believe that it is 
true draw a circle around “‘T.” If you believe that it is false, draw 
a circle around “‘F.”’ If in doubt, draw a circle around ‘?.”’ Do the 
same with all the other statements. 


Per Cent True 
M.D. D.P. N. 


One who breaks a mirror has seven years of bad luck. 

Friday the thirteenth always brings bad luck. 

A task begun on Friday is doomed to failure. 

Finding a horseshoe brings good luck. 

It is well to tap on wood after boasting of one’s good fortune. 

If an expectant mother sees a person with a birth-mark, her child 
also will have a birth-mark. 

The lines in the palm of the hand indicate one’s future. 

It is possible to communicate with departed spirits. 

The position of the planets at the time of one’s birth determines 
one’s character and fortune. 

8 12 4 Fortune tellers can predict one’s future. 

14 18 2 The number seven is the perfect number. 

56 54 36 A left-handed person should be taught to use his right hand. 

30 30 8 A person who avoids another’s gaze is dishonest. 

28 20 22 A snake never dies until after sundown. 

26 20 20 Very intelligent children are usually very weak physically. 

26 26 32 Men are more intelligent than women. 

20 14 22 The children of first cousins are always feeble-minded. 

34 26 6 An expectant mother by fixing her mind on a subject can influence 

the character of her unborn child. 


or ON CO 
Orono co 


— 
CO bd 
~ D> > 
Oo - 








OS 


ied 


a a es ee es 


age Minos ot 


Sas 


a aa 


in a es oe ae 
Sy aA ae Tes ea, 


5 


ls 





Rives ie TSS 
TPR ps ee BO 


64 


The Journal of Educational Psychology 


Per Cent Troup 


M.D. D.P. N. 

40 28 20 An artistic nature is indicated by long, slender hands. 

54 40 18 One can estimate very accurately an individual’s intelligence by 
looking at his face. 

40 30 8 Theshape and the prominences of the head indicate one’s character. 

6 4 2 Three on a match is bad luck. 

20 22 8 One mind can communicate to another by telepathy, that is without 
the use of signs or sounds. 

32 26 10 A square jaw is a sign of will power. 


26 


18 4 God sometimes talks to certain individuals. 


BIBLIOGRAPHY 


. Caldwell, O. W. and Lundeen, G. E.: ‘Students’ Attitudes Regarding Un- 


founded Beliefs.”” Science Education, Vol. XV, 1931, pp. 246-266. 


. Gilliland, A. R.: ‘‘A Study of the Superstitions of College Students.” J. of 


Abn. and Soc. Psychol., Vol. XXIV, Jan.—Mar., 1930, pp. 472-479. 
Caldwell, O. W. and Lundeen, G. E.: ‘‘Changing Unfounded Beliefs. A Unit 
in Biology.” Social Science and Mathematics, April, 1933, pp. 394-413. 


. Lundeen, G. E. and Caldwell, O. W.: ‘‘A Study of Unfounded Beliefs Among 


High School Seniors.” J. of Educ. Research, Vol. X*_IiI, 1930, pp. 257-273. 
Maller, J. B. and Lundeen, G. E.: “Superstition and Emotional Maladjust- 
ment.” J. of Educ. Research, April, 1934, pp. 3-28. 


. Wagner, M. E.: “‘Superstitions and their Social and Psychological Correlat ves 


Among College Students.” J. of Educ. Sociology, Vol. II, Sept., 1928, pp. 
26-36. 


. Powers, F. F.: ‘‘The Influence of Intelligence and Personality Upon False 


Beliefs.” J. of Soc. Psychol., Vol. XI, Nov., 1931, pp. 490-493. 


. Garrett, H. E., and Fisher, T. R.: “‘The Prevalence of Certain Popular Mis- 


conceptions.” J. of Applied Psychol., Vol. X, 1926, pp. 411-421. 


. Garrett, H. E.: Statistics in Psychology and Education. Longmans, Green and 


Co., New York, pp. 118-145. 





SOME MEASUREMENTS OF THE EFFECTS OF 
REVIEWS 


H. A. PETERSON, MARY ELLIS, NORINE TOOHILL, AND PEARL KLOESS 
Illinois State Normal University 


This investigation deals with two somewhat different but related 
phases of the value of reviewing. The first problem is, given a piece 
of curricular material, say a passage in history, one page in length 
studied on a given day: Of how much benefit is a review a week later, 
and of how much benefit are two reviews? How permanent are the 
benefits? The second problem; What is the relative effectiveness for 
recall purposes of different locations of a review between the time of 
learning and the time of being tested? 


FIRST PROBLEM 


With regard to the first problem, the first review was located one 
week after learning, and the second, when there was one, two weeks 
after learning. The retention-intervals chosen were two weeks, three 
weeks, six weeks, and eighteen weeks. A test for recall is necessarily 
a continuation of learning, and complicates the situation, introducing 
an irrelevant factor, namely, the effect of an earlier test on a later test. 
Hence we employed many equivalent groups of subjects, and allowed 
each group to appear in the experiment but once. We assigned each 
experimental group to be tested at a single retention-interval only. 
In all we used ten groups, assigning two of them to the two-weeks 
retention-interval, two to the three-weeks retention-interval, three to 
the six-weeks interval, and three to the eighteen-weeks retention- 
interval (see Fig. 1). These groups at each retention-interval were 
equated on the basis of scores in a test of retention of a prose passage 
for one week. The subjects were classes in élementary psychology 
in the Illinois State Normal University. The groups varied from 
twelve to twenty-seven in size, averaging nineteen. 

All of the groups learned the same selection, which was an historical 
passage, twenty-five lines in length, The Origin of Monasticism in 
Western Europe.! They were given the passage in mimeograph form, 
with instructions to study it as one would study an ordinary history 
lesson, to read the entire passage once, and spend the remainder of 





1 Adapted from Emerton, E.: Introduction to the Middle Ages, Ch. XI. 
65 








66 The Journal of Educational Psychology 


the 2.5 minutes on whichever parts they seemed least sure of. Imme- 
diately after the study period, there was a free written reproduction 
in essay form with a time-limit of twelve minutes. These were scored 
for ideas, the passage being considered to contain forty-three ideas or 
facts. These scores furnished the group averages in immediate recal. 
The reviews were all alike, and consisted in a repetition of the condi- 
tions of original learning in all respects, except that the written repro- 
ductions were not scored. They were reviews with the text and 
followed by a test. This is a rather thorough review. The tests 
for retention were likewise written essay-reproductions and were 
scored for ideas. 


RESULTS 


On the whole the benefits of the reviews were large and relatively 
permanent, as shown in Fig. 1. For example, after two weeks the 


Per cent 


W 


Weeks 2 3 4 5 6 ? 8 9 10 1.060642 Sei HSC‘ K]SC- 18 


Fia. 1.—Percentage of selection recalled after two, three, six, and eighteen weeks. At 
two and three weeks the left bar is the control group, and the right one the one-review 
group. At six and eighteen weeks the three bars represent from left to right the control 
group, the one-review, and the two-review group. 











one-review group showed a superiority of forty-seven per cent over 
the control group, which had no review, and the condition is much 
the same after three weeks. After six weeks the one-review group 
showed a superiority of twenty-eight per cent, and the two-review 
group of seventy-five per cent over the control group. After eighteén 
weeks the one-review group showed a superiority of eighteen per cent, 
and the two-review group a superiority of fifty-seven per cent over 
the control group. : 

Another significant fact is that the second review is even more 
effective than the first. For example, after 6 weeks the superiority 
of the two-review group is almost three times that of the group which 
had one review, and after eighteen weeks it is more than three times as 
effective. 








Measurements of the Effects of Reviews 67 


Thirdly, the increments added by all the reviews lessen more 
rapidly with time than the original learning. This is seen by compar- 
ing in Fig. 1 the ratio of columns zero to one (the control group and the 
group having one review) at two weeks with the corresponding ratio 
at the other retention-intervals; also by comparing the ratios % and 
1 at six weeks with the same ratios at eighteen weeks. 

In Fig. 1 the results are stated as percentages of the passage 
recalled by the different groups. In Fig. 2 the same results are 


Per cont 
120 


105 


$¢6¢e8 aes 


is 

















Veekes 2 S) 4 6 6 7 8 9 10 14a «12 «6150~—(6l4 16 hU6 lO 18 


Fia. 2.—Ratio of delayed recall to immediate recall, the latter being represented as one 
hundred per cent. For explanation of the bars, see Fig. 1. 

stated as ratios of delayed to immediate recall. Whatever each per- 
son’s immediate recall was, it was taken as one hundred per cent, 
and the amount he recalled in delayed recall was calculated as a 
percentage of his immediate recall without regard to whether the units 
in the delayed recall were the same as in the immediate recall or not. 
Thus, after two weeks the control group had about sixty per cent as 
much as on the day of learning, while the group which had a review 
had almost as much as on the day of learning, and the same is true 
after three weeks. After six weeks two reviews put the persons in 
that group in possession of more than on the day of learning, and 
after eighteen weeks, those who had two reviews recalled ninety per 
cent as much as on the day of learning. From this standpoint the 
benefits of the reviews are decidedly encouraging. 


RELIABILITIES 


When the results were in this form, we calculated the probable 
errors of the ten group averages in Fig. 2. From left to right, they 








68 The Journal of Educational Psychology 


were: .61 + .12, 97 + .13, .55 + .13, .96 + .16, 60 + .17, .82 + .15, 
1.14 + .11, .53 + .10, .66 + .13, .90 + .16, all of which, except one, 
indicate satisfactory reliability. 

The question may occur to some whether it is permissable to con- 
struct forgetting curves by combining the corresponding average 
recalls at different retention-intervals, and thus obtain a forgetting 
curve for the no-review and for one-review groups. Inasmuch as the 
groups at different retention-intervals are not sufficiently nearly 
equivalent, this is questionable. 

In conclusion, employing a rather short selection, a rather short 
learning time, and a review of a rather thorough type, namely, a 
repetition of the original learning, so far as the conditions were con- 
cerned, we found that the benefits of the reviews were large and rela- 
tively permanent. Secondly, we found that the effects of the reviews 
showed a greater tendency to shrinkage with time than the effects of 
the original learning. Thirdly, a second review makes a very good 
showing, as compared with the first review, being on the whole even 
more effective. 


SECOND PROBLEM 


With regard to our second problem, the question when it is best to 
review has been before the public for a long time but little has been 
done with it experimentally. Thorndike, using paired associations, 
came to the conclusion that the earliest reviews should come soon after 
learning and be relatively long; that thereafter the reviews should 
occur at increasingly long intervals, and occupy less time.! However, 
he took no account of the factor, nearness to the time of the test for 
recall. College students are well aware of the favorableness of this 
location, as shown by the fact that they spend much time in reviewing 
just before an examination. . We may illustrate the situation. Sup- 
pose one studies a selection on a given day, and the time for recall 
is fixed at the tenth day. Suppose we choose two locations for 
reviews, the earlier on the second or third day and the later on the 
seventh day after learning. The earlier has the advantage of nearness 
to the learning. It can rescue much from oblivion because there is 
much to rescue in the mind at that time. The later has the advantage 
of nearness to the time of being tested. Any review shows the same 





1 Thorndike, E. L.: The Fundamentals of Learning, Ch. 6. 








Measurements of the Effects of Reviews 69 


course of forgetting as an original learning, and one may review too 
soon. Moving a review closer to learning increases one advantage 
and lessens the other. Accordingly, the second problem was, What 
are the relative strengths of the different locations between learning 
and recall? We chose to experiment with the locations I have just 
described. 

(1) Comparison of a two-day with a seven-day review-interval. Here 
we compared the effects of a review placed two or three days after 
learning with one seven days after learning, the tests being on the 
tenth and the twenty-first days. In this case the same subjects were 
tested on both days. 

The subjects were classes in elementary psychology. The equiva- 
lent group method was used, there being three groups of fourteen 
each equated by the Iowa Silent Reading Test. The experiment 
was tried twice with different learning selections. Both selections 
were much longer than in our investigation of the benefits of reviews. 
There they were one page; here they were six pages long. One was a 
long version of the selection previously used, The Origin of Monastic 
Orders in Western Europe; the other was an adaptation of a part of 
Chapter 19 in Woodworth’s Psychology, first edition, on Worry and 
Day Dreams. The study times were fifty per cent more than the 
reading times of the selections. The reviews consisted of another 
study period of the same length as in original learning, but no test 
was given. It was therefore a review with text, but no test. The 
tests for retention were essay-answer, and the same test was used on 
both the tenth and the twenty-first days. In other respects the 
technique was the same as in the first investigation. 


RESULTS 


The result was that the two locations were equally effective in 
both trials. The data are given in Table I. The percentages in 
Monasticism and Worry are the ratios of delayed to immediate recall. 
In the first trial (Monasticism) there was a slight advantage for the 
earlier location, but it was not significant in amount; in the second 
(Worry) the two locations were equally effective. 

A repetition of the experiment using German-English vocabulary 
couplets gave the same result. The review-intervals were the same 
as before. There were ten couplets in a series, and four series totalling 








5 
4 
4 
c 


ee ke ae 





OE 
> . 


70 


The Journal of Educational Psychology 


TaBLE I.—CoMPaARISON OF A REVIEW Two oR THREE Days AFTER LEARNING 


witH OnE SEVEN Days AFTER LEARNING 


Selection: Monasticism 























, Recall after Recall after 
Review 
Cases ; ten days, twenty-one days, 
intervals 
per cent per cent 
14 2 or 3 days 132 120 
14 7 days 123 113 
13 None 90.6 84.9 
Selection: Worry and Day Dreams 
21 2 or 3 days 95 85 
23 7 days 100 82 
21 None 70 60 








CoMPARISON OF THE SAME REVIEW INTERVALS UsiING VocABULARY COUPLETS 








a a a ee Recall after Recall after 
ten days twenty-one days 
14 2 or 3 days 18.5 14 
14 7 days 19.14 13.07 
13 None 6 6 














TaBLE II.—CoMPARISON OF A REVIEW OnE Day AFTER LEARNING WITH ONE 


NinE Days arreR LEARNING 
Selection: The Newspaper 



































Recall 
] 
Review Recall “oa after 
Cases after SD SD |twenty-| SD 
interval : ten 

learning Pa one 

y days 
24 1 day 18.3 1.20 20.4 91 20.1 .95 
24 9 days 19.8 1.10 19.9 1.10 20.3 91 
25 None 18.6 .89 17.4 .72 17.2 .88 

Selection: Immigration 

22 1 day 12.6 .73 14.3 .69 14.3 .65 
22 9 days 11.4 .52 13.5 .78 14.4 . 56 
24 None 12.7 75 11.9 .62 11.0 58 











Measurements of the Effects of Reviews 7 


forty couplets. In the tests the German word was given, and the 
English equivalent called for. The figures give the average number of 
couplets recalled out of forty. 

We interpret the result as a case of compensation, nearness to 
learning in the earlier review-location being offset or compensated 
for by nearness to the test in the later location. If this theory is 
correct, moving the earlier review nearer to learning and the later one 
nearer to the test by an equal amount would probably show the two 
locations to be still equally effective. We tested the matter out by 
moving the earlier review from two to three days after learning to one 
day after, and the later review from seven days to nine days after 
learning. The tests were kept at the tenth and the twenty-first days. 
Sophomore classes in elementary sociology were used as subjects. 
The learning selections were an eleven-page adaptation of Chapter. 4 
on The Newspaper in Park and Burgess’ The City, and a rather difficult 
six-page selection on /mmigration taken from Devine’s, The Principles 
of Relief. The results were the same as before: The two locations for 
review, one day and nine days after learning, were equally effective. 
The results are given in detail in Table II. Maximum scores obtain- 
able: The Newspaper, thirty-five, Immigration, twenty-one. 

To summarize the results of our investigation of the effects of dif- 
ferent locations of reviews, we have shown that the value of any given 
location depends on its nearness to learning on the one hand and to 
the time of reproduction on the other. These factors are compensa- 
tory. With reproduction set at the tenth and twenty-first days we 
found two pairs of locations, viz., the second or third day and the 
seventh day, and the first and the ninth day at which the reviews were 
equally effective. 

One might expect that the later location would prove inferior to 
the earlier if one were to wait longer than the tenth day to test. The 
tests on the twenty-first day showed the two review-locations to be 
still equal. 

Several interesting questions for further investigation are sug- 
gested by these results. In the first place, it may be necessary 
to revise the current theory of when to review and how long, as stated 
by Thorndike, and take into account the location of the review with 
reference not only to learning but also to the time when the informa- 
tion is going to be used. In the second place with regard to the situa- 
tion in our own experiment, what would be the effect of postponing 
the test from the tenth to, say, the sixtieth day after learning, 





72 The Journal of Educational Psychology 


keeping the reviews located where they have been? Again, is there 
much difference in any of the locations between learning and the 
tenth day, as long as we work with a pair of balanced reviews in this 
fashion, and continue the use of the text in the review? So far, we 
have not found the recalls highly sensitive to slight shifts in the loca- 
tions of the reviews. 











A NEW DEVICE THAT SCORES TESTS! 


NOEL B. CUFF 
Eastern Kentucky State Teachers College 


I, INTRODUCTION 


Teachers have long been confronted with the difficult problems 
involved in assigning grades. There is an accumulation of evidence, 
however, which indicates progress has been made in developing 
techniques. Nevertheless an obvious need for greater accuracy in 
scoring continues to exist. And data presented in several different 
contributions show also that serious errors are almost certain to creep 
into the scoring of objective tests by students, teachers, or clerical 
workers. A start was made, however in the direction of scoring tests 
accurately and rapidly, over a decade ago, by Stone. He introduced 
a separate answer slip on which a subject could indicate his answers 
to questions. Since then, carbon, mimeograph, chemical, perforation, 
and other marking schemes have been proposed. It is obvious these 
devices increase the rate and accuracy of scoring above that obtainable 
by the common methods of using a key. But they do not eliminate 
the errors and the time involved in counting answers. Electrical 
scoring machines have been used experimentally to reduce these 
counting errors in addition to eliminating the marking errors. We 
have used scales too, such as are used in check weighing, in inventory 
counting, and in numerous other operations where speed and accuracy 
are imperative, to determine rapidly and with accuracy the number 
of correct answers to objective tests. 


II. INVESTIGATION 


The problems of this study are: (1) To determine how rapidly and 
accurately tests can be scored by weighing the examinee’s responses. 
(2) To compare the speed and the accuracy of determining scores by 
weight with one of the best methods of scoring tests. 

The procedure involved in this investigation was relatively simple. 
Mimeographed multiple choice tests of one hundred statements, 
with five possible responses for each statement, were given to one 
hundred twenty college students. The technique introduced by Otis 





1 Read at the 42d annual meeting of the American Psychological Association, 
Columbia University, 1934. 


73 





74 The Journal of Educational Psychology 


for recording answers was used. After the choices had been recorded 
in the spaces provided for them, each student was given a Perforated 
Answer Card which consisted of two duplicate cardboards with five 
hundred holes, between which was held a sheet of manifold typewriter 
paper. The students were instructed to write their names, class, and 
the like on the inserted sheet of paper—the top cardboard was shorter 
than the bottom, so that this could be done easily. Then each 
student was told to transfer his answers from the mimeographed 
test to the answer card by punching with a pencil point through the 
hole in a row that corresponded with the number of his choice for a 
given question. For example, the instructions were: “If you think 
the correct answer to the first question is the third, punch through the 
paper in the third hole of the first row of holes. Then if you think 
the fifth choice is the best choice for the second statement, punch 
through the fifth hole of the second row of holes, and continue in this 
way for each of the one hundred statements.” After a student’s 
answers had been transferred to a perforated answer card, the answers 
on the mimeographed test and on the answer card were checked care- 
fully to insure a perfect agreement. (Probably it should be stated 
that as a rule the answers would be indicated in the beginning on the 
answer card and then the unmarked questions could be filed for future 
use.) The one hundred twenty mimeographed tests were scored by 
ten senior college students. They were instructed during a laboratory 
period to score and rescore the papers until the same answer was 
secured twice in succession. All scores and the time required for 
scoring were separately recorded. 

The number of correct perforated answers was determined by 
weight. The apparatus used to weigh answers is called a testometer. 
It consists of an open frame on which an answer card is placed just 
above the platter of a scale and of a block or weight elevator, above 
the answer card frame, that is in length and breadth about the dimen- 
sions of the perforated answer card. The weight elevator has five 
hundred holes in it that exactly correspond to the perforations in 
the answer card. The operator in setting the apparatus for weighing 
the answers to the tests used in this study, inserted a slender wire 
pin, with a head larger than the hole in the elevator, in the third 
hole of the first row of holes, another in the fifth hole of the second 
row, and so on because those holes corresponded with the holes in 
the answer cards which should have been punched by the students. 
After setting the apparatus, the operator placed an answer card on 





ots & © O % oo” Oo Oo 


—_ wf De & 








A New Device that Scores Tests 75 


the frame and lowered the weight elevator. At every place where the 
examinee had punched a correct hole, a weight perpendicular to the 
elevator and acted on by a vertical force (gravity) fell to and stood 
on the rectangular platter of a scale. The weights—a fourth of an 
ounce each—however, were light enough so that if the paper had not 
been punched in a given answer hole, the paper held up that weight. 
A scale was used that has minimum graduations of an eighth of an 
ounce. This scale is particularly designed for light drafts and permits 
a minimum tolerance value of one sixteenth of an ounce and likewise 
a@ maximum sensibility reciprocal of one eighth of an ounce. The 
manufacturer furthermore claims it is sensitive and accurate to less 
than a sixty-fourth of an ounce. The rate and accuracy of scoring 
tests by weight are indicated by the results. 


III. RESULTS 


The findings are presented in Tables I and II. 

Table I shows the rate at which students can score tests by one 
of the most rapid of the usual methods and the rate at which the 
present experimental model of the testometer can be operated for 
scoring tests by weight. The students scored the tests one time 
at the rate of 17.6 tests per hour, with a standard error of 1.39, and 
they scored the tests two times at the rate of 9.76 tests per hour, with 


TaBLe I.—RatTe or Scorina TEsts 























Tests scored per hour 
| 
leita By students Difference Chances Per 
By testo- | between Pc re d cent 
meter oneand | P/saitt. — 
Once | Twice theee of a true | 

1 9 3 difference | }¢ | 34 
Average...| 17.60) 9.76 | 180.00 162.40 11.12 100 y | 5 
cAverage..| 1.39) .78 14.55 14.60 























a standard error of .78. The tests were scored by weighing at the rate 
of 180! + 14.55 tests per hour. Hence it is obvious that the tests 
were scored about ten times faster by weighing than by having students 
score them one time and twenty times faster than by having students 





1Two operators can score about three hundred sixty tests per hour on the 
same testometer. 





76 The Journal of Educational Psychology 


score them twice. The minimum difference between the average 
number of tests scored per hour by students and by the testometer 
is 11.12 times the sigma of the difference. Consequently the chances 
are one hundred in one hundred that a true difference in rate exists 
in favor of the testometer. 

Table II gives a brief summary of the accuracy of scoring tests 
by the methods being compared. The data in this table show that 
thirty-one of the tests were scored one point too high or low, twelve 
were incorrectly scored two points, and two were misscored ten 
points, the first time they were scored by students. The average 
error was .97, but the range, which is probably more significant in 
this case, was ten points. In comparison, only one of the one hundred 
twenty papers was incorrectly scored on the testometer. The total 
error and the range of errors for the testometer was one point and the 
average error was .008. This error was probably due to parallax 
and, if so, it could have been avoided by having an additional indicator 
on the scale to facilitate the correct positioning of the eye of the 
observer properly to read the indications of the scale. It is evident, 
however, from the table that the average error for tests scored one 


TaBLE I].—Accuracy or Scorina TEstTs 


















































Size and number of errors per one hundred twenty tests 
Tests scored | Per cent 
0/1)2/3/)4/5)|6/7/8)9/10 wihedl hein scored 
age | age 
correctly 
(A) Once by students. . .| 66/31 12) 5 3j..] Lj..j..]..] 2] .97 .147 | 55 
(B) Twice by students. .; 95.14) 6) 4/..)..)..)..)..]..] 1] .40 sate 79 
(C) By testometer......'119! 1)..|..|..]..|..|..]..]..1..| .008 | .008 | 99.99 
Ee ee fa Wats oka ea wsekeékee trex sene dee 120.9 
Na eed ee bake ananenneeawke 50.0 
Difference between A andC.........................., .859 | .147 
a Nt cia a ee hae ele oe | 5.85 
Chances in one hundred of a true difference............ | 100 








time by students is 120.9 times the average error for tests scored by 
weight. And that the average error for tests scored twice by students 
is fifty times the error for weighing tests. Likewise the table shows 
that fifty-five per cent of the tests were correctly scored the first time 
by students and seventy-nine per cent were correctly scored the second 
time. In comparison, 99.99 per cent of the test items were correctly 








SCO 
nul 
five 
dif 
anc 
the 
hui 
exi 


cor 


tin 
ust 


s8cC 
ar¢ 


tin 
us 
sp: 


int 








A New Device that Scores Tests 77 


scored by means of the testometer. This evidently shows that the 
number of tests misscored by students was from twenty-one to forty- 
five times the number incorrectly scored by weight. Furthermore the 
difference between the average error for tests scored once by students 
and the average error for tests scored with the testometer is 5.85 times 
the sigma of the difference. This indicated the chances are one 
hundred in one hundred that a reliable difference greater than zero 
exists between the average errors for the two methods. 


IV. CONCLUSIONS 


The data presented in this study apparently justify the following 
conclusions: 

1. Test scores can be determined by weight from ten to forty 
times faster than they can be determined by one of the best of the 
usual procedures. 

2. The number of papers misscored per hundred by scorers is from 
twenty-one to forty-five times the number inaccurately scored per 
hundred by means of a testometer. 

3. The average error for test scores computed by the Otis procedure 
is from fifty to one hundred twenty times the average error for scores 
computed by weight. 

4. The differences between the rate and accuracy of scoring by 
scorers and by a testometer are over five times their respective stand- 
ard errors. Hence, true differences in favor of the testometer are 
insured. 

5. Scoring tests by weight is economical. It saves the scorer’s 
time, it enables the examiner to file the unmarked questions for later 
use, and it permits the instructor to file the scored sheets in a limited 
space. 

6. Scoring tests by weight is efficient. It reduces error and 
increases speed; thus, making it possible for the teacher to devote 
her time to real teaching. 

7. Students like to have their scores determined by weight. It 
enables them to learn their scores promptly and eliminates from the 
grading certain questionable, personal, equations. 











BOOK REVIEWS 


CHaRLes A. Exvitwoop. Methods in Sociology. Duke University 
Press, 1933. 


In this collection of articles which have already appeared in 
various journals, Prof. Ellwood contributes a closely-reasoned, 
plausible case for the consideration of ‘‘values”’ in the social sciences. 
He protests the common, contemporary attempts to strip sociological 
study of all save the most rigid, ‘‘objectivistic” technique which has 
been applied with such handsome results to the natural sciences. His 
objections to enlisting the purely ‘‘ behavioristic”’ approach to sociology 
are based on these methodological errors: First, that it refutes 
‘scientific’? endeavor in so far as it closes the door to all other experi- 
mental attitudes; second, that it is apt to state its results in language 
unintelligible except to the initiated (which is not an insuperable 
objection); and third, that it offers no adequate basis for dealing 
scientifically with the so-called non-material aspects of culture. 

In furthering his argument Ellwood maintains that the role of the 
imagination, important as it is in the physical sciences, must play 
even a larger part in the social sciences, and virtually takes the place 
of observation. The historical studies should play a pre-eminent 
part in all sociological work because much of individual behavior, 
social interaction and especially institutions can be analyzed ulti- 
mately as products of an historical evolution. Abetted by psy- 
chological analysis, the historical method takes the place in the social 
sciences of the instruments of precision and mensuration in the 
physical sciences. Sociology is to get its facts, therefore, from three 
sources: Anthropology and ethnology, written history, and finally 
from the social survey, all to keep the discipline inductive. 

But all these data remain impervious to the understanding, cold 
and inutile, until they are informed by interpretation. Even the 
artifacts of prehistory remain nothing but curios and intelligible, as 
Prof. Jensen observes in a revealing introduction to this small volume, 
until we discover how they functioned within a system of values. 

The entire burden of the argument is persuasive enough and rests 


~ upon a well-considered reaction to the needless worship of precision 


and objectivity, but like all such polemical and methodological 

treatises, it leaves us all the more impatient to see done that spade- 

work required to produce the long-awaited ‘‘science of society.” 
NaTHAN MILLER. 


Carnegie Institute of Technology. 
78 





Ay \y ' ont —— we ae = 


ws 


eo ew ~~ we 


@ ~ 2 O Ss OO DO wm 


— 





Book Reviews 79 


Epwarp L. THORNDIKE and OTHERS. Prediction of Vocational 
Success. New York: The Commonwealth Fund, 1934, pp. 284. 


Vocational Guidance enthusiasts will have an interesting time 
reading the results of a ten-year study in the guidance field, as revealed 
in ‘‘ Prediction of Vocational Success” by Dr. Edward L. Thorndike 
and five associates. The purpose of the book is to tell what was 
discovered by following a large group of children who were carefully 
studied in 1922. 

The problem consists of an investigation of the possibilities of 
guidance at about age fourteen on the basis of items in the child’s 
records and on the results obtained by subjecting the child to a series 
of psychological tests. The book is technical and detailed and gives 
approximately seventy tables. The deductions are stated clearly and 
definitely. 

Since the conclusions are drawn from facts and not from opinions 
or theories, the book should be extremely interesting to administrators 
of public school systems as well as to employment and personnel 
managers in industry and commerce. 

This book should do much to bring many of the vocational guidance 
leaders and theorists back to earth to face facts as they exist. It 
indeed explodes many pet theories developed in connection with 
extensive prognostic testing programs given during the junior high 
school period. It emphasizes the importance of developing the 
personal characteristics of an adolescent youth as against the develop- 
ment of functional abilities. It points out rather definitely that 
specific occupational training for youth has little or no effect on the 
probable success in employment. 

Quoting from the volume: 


In the case of those persons who worked nine-tenths of the time from age 
eighteen to twenty-two at mechanical or manual work, the items of school record 
and test scores show correlations from .00 to .14. All are then nearly valueless, 
alone, or in combination, as means of forecasting success at mechanical work—no 
combination of facts at age fourteen would enable a vocational counselor to foretell 
much better than a sheer guess how much a boy or girl will earn at mechanical work 
six or eight years later or how happy he will be at it. 


It is surprising to learn from the study that conduct and attend- 
ance records in school had little or no significance in predicting voca- 
tional success, except in the cases of pupils who were extremely 
inferior in all respects. These one or two children out of every 
hundred became loafers or criminals. 


